[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-05-14 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r30359968
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

FYI most systems don't support leap seconds. I'm not sure why we'd want to 
support them here...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-05-14 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r30381187
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

This mainly aims to make sure we are compatible with hive


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-05-14 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r30381621
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

Does Hive support leap seconds? I looked into the implementation of jodd -- 
I don't think it supports that when doing date timestamp conversion. I could be 
wrong though.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-05-14 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r30382532
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

If we are going to do this conversion ourselves, I think it is fine...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72611319
  
  [Test build #26622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26622/consoleFull)
 for   PR 3820 at commit 
[`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-03 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72724023
  
Thanks!  Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72573565
  
I tried the newly uploaded parquet data in 
https://issues.apache.org/jira/browse/SPARK-4768 (I set my timezone to UTC), 
for one line, I got
```
[test row 5,2015-01-02 20:54:10.000456789]
```
But, the data was generated by 
```
insert into string_timestamp (dummy,timestamp1) values('test row 5', 
'2015-01-02 20:54:10.123456789');
```
Can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72573027
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26568/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72573025
  
  [Test build #26568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26568/consoleFull)
 for   PR 3820 at commit 
[`5152f2a`](https://github.com/apache/spark/commit/5152f2ab02a4d27be0a5e540f73aab166774fe2e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72572992
  
I just rebased my code and upgrade jodd to 3.6.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r23972062
  
--- Diff: pom.xml ---
@@ -149,6 +149,7 @@
 scala.binary.version2.10/scala.binary.version
 jline.version${scala.version}/jline.version
 jline.groupidorg.scala-lang/jline.groupid
+jodd.version3.5.2/jodd.version
--- End diff --

Any reason not to use the latest version (3.6.3?) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72567388
  
Dependency looks fine to me, thanks for running it by.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72572873
  
  [Test build #26568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26568/consoleFull)
 for   PR 3820 at commit 
[`5152f2a`](https://github.com/apache/spark/commit/5152f2ab02a4d27be0a5e540f73aab166774fe2e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r23972716
  
--- Diff: pom.xml ---
@@ -149,6 +149,7 @@
 scala.binary.version2.10/scala.binary.version
 jline.version${scala.version}/jline.version
 jline.groupidorg.scala-lang/jline.groupid
+jodd.version3.5.2/jodd.version
--- End diff --

To keep align with Hive 0.14.0, not necessary though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72594576
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26597/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72588533
  
  [Test build #26583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26583/consoleFull)
 for   PR 3820 at commit 
[`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LogisticGradient(numClasses: Int) extends Gradient `
  * `case class HiveScriptIOSchema (`
  * `  val trimed_class = serdeClassName.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72588539
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26583/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72594570
  
  [Test build #26597 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26597/consoleFull)
 for   PR 3820 at commit 
[`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72582056
  
  [Test build #26583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26583/consoleFull)
 for   PR 3820 at commit 
[`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72581952
  
@yhuai Sorry, I got a misunderstanding on the setNanos API, it works OK now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72601905
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72588631
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72588700
  
  [Test build #26597 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26597/consoleFull)
 for   PR 3820 at commit 
[`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-72602402
  
  [Test build #26622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26622/consoleFull)
 for   PR 3820 at commit 
[`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-28 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-71797016
  
ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-71004690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25963/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-71004681
  
  [Test build #25963 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25963/consoleFull)
 for   PR 3820 at commit 
[`5d1eeed`](https://github.com/apache/spark/commit/5d1eeedd43d60d9ef9c5dcc0e97fff829ccea8ed).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-22 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70992950
  
I have fixed the bug
This is quite embarrassing, I forgot to set those factors(NANOS_PER_SECOND, 
SECONDS_PER_MINUTE, MINUTES_PER_HOUR) to Long when divide, so it overflew... I 
have tested it, now it works fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70992996
  
  [Test build #25963 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25963/consoleFull)
 for   PR 3820 at commit 
[`5d1eeed`](https://github.com/apache/spark/commit/5d1eeedd43d60d9ef9c5dcc0e97fff829ccea8ed).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-21 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70904105
  
Good to hear. Here's how I create my test data, I run this in Hive and then 
take the data from HDFS directly and Spark is able to read/parse the data file 
(with issue above):

Set parquet.compression = SNAPPY;

DROP TABLE testdata;

CREATE TABLE testdata
STORED AS PARQUET
AS SELECT a.*, from_utc_timestamp('1970-01-01 08:00:00','PST') as timestamp
FROM sample_07 AS a;


I have looked into this a fair bit and attempted a fix. Thanks for working 
on this fix and let me know if I could help in any way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r23207875
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala ---
@@ -294,14 +294,14 @@ class ParquetQuerySuite extends QueryTest with 
FunSuiteLike with BeforeAndAfterA
 // Check to make sure that the attributes from either side of the join 
have unique expression
 // ids.
 query.queryExecution.analyzed.output.filter(_.name == myint) match {
-  case Seq(i1, i2) if(i1.exprId == i2.exprId) =
+  case Seq(i1, i2) if i1.exprId == i2.exprId =
--- End diff --

Just a remind, `ParquetQuerySuite` is considered deprecated, and I'm going 
to remove it in #4116. I should have added a comment to make this clear. Didn't 
remove it at first because several Parquet PRs were out there and might cause 
unwanted merge conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r23207993
  
--- Diff: docs/sql-programming-guide.md ---
@@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
   /td
 /tr
 tr
+  tdcodespark.sql.parquet.int96AsTimestamp/code/td
+  tdtrue/td
+  td
+Some Parquet-producing systems, in particular Impala, store Timestamp 
into INT96. Spark would also
+store Timestamp as INT96 because we need to avoid precision lost of 
the nanoseconds field. This
+flag tells Spark SQL to interpret INT96 data as a timestamp to provide 
compatibility with these systems.
+  /td
+/tr
--- End diff --

I'm probably OK with this property. Just wanna ask doesn't Impala store 
original type information (TIMESTAMP) together in Parquet metainfo?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r23256480
  
--- Diff: docs/sql-programming-guide.md ---
@@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
   /td
 /tr
 tr
+  tdcodespark.sql.parquet.int96AsTimestamp/code/td
+  tdtrue/td
+  td
+Some Parquet-producing systems, in particular Impala, store Timestamp 
into INT96. Spark would also
+store Timestamp as INT96 because we need to avoid precision lost of 
the nanoseconds field. This
+flag tells Spark SQL to interpret INT96 data as a timestamp to provide 
compatibility with these systems.
+  /td
+/tr
--- End diff --

From my digging only Parquet-format 2.2 has the TIMESTAMP and 
TIMESTAMP_MILLS types. Cloudera is still on 1.5.0
Hive/Impala has been writing this INT96 nano sec format that's different.

--- Original Message ---

From: Michael Armbrust notificati...@github.com
Sent: January 20, 2015 11:25 AM
To: apache/spark sp...@noreply.github.com
Cc: Felix Cheung felixcheun...@hotmail.com
Subject: Re: [spark] [SPARK-4987] [SQL] parquet timestamp type support 
(#3820)

 @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
/td
  /tr
  tr
 +  tdcodespark.sql.parquet.int96AsTimestamp/code/td
 +  tdtrue/td
 +  td
 +Some Parquet-producing systems, in particular Impala, store 
Timestamp into INT96. Spark would also
 +store Timestamp as INT96 because we need to avoid precision lost of 
the nanoseconds field. This
 +flag tells Spark SQL to interpret INT96 data as a timestamp to 
provide compatibility with these systems.
 +  /td
 +/tr

Yeah, I agree that it's weird though.  Perhaps we should ask the parquet
list why they don't support the int 96 version.
On Jan 20, 2015 11:21 AM, Cheng Lian notificati...@github.com wrote:

 In docs/sql-programming-guide.md
 https://github.com/apache/spark/pull/3820#discussion-diff-23247492:

  @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
 /td
   /tr
   tr
  +  tdcodespark.sql.parquet.int96AsTimestamp/code/td
  +  tdtrue/td
  +  td
  +Some Parquet-producing systems, in particular Impala, store 
Timestamp into INT96. Spark would also
  +store Timestamp as INT96 because we need to avoid precision lost 
of the nanoseconds field. This
  +flag tells Spark SQL to interpret INT96 data as a timestamp to 
provide compatibility with these systems.
  +  /td
  +/tr

 Oh, I see the difference here. Double checked, Parquet only provides
 TIMESTAMP and TIMESTAMP_MILLIS

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3820/files#r23247492.


---
Reply to this email directly or view it on GitHub:
https://github.com/apache/spark/pull/3820/files#r23247815


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70791893
  
@adrian-wang Can you try the parquet file uploaded in 
https://issues.apache.org/jira/browse/SPARK-4768? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70789269
  
@felixcheung  I have tested my code with HIVE 0.14.0 release.
I first tried to generate data with hive but spark need metadata to read 
hive's data.
Then I generated data using spark, then save as parquet file, copied the 
saved file to my hdfs to my hive warehouse dir, and in hive cli I executed 
create table. Then I use select to read the data, it worked fine.

How did you read data from hive? Can you show me how to reproduce this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70795739
  
@yhuai Thanks for the link, I'm afraid the attachment contains only the 
last row, in which the timestamp field is null. Anyway, I have reproduced the 
bug @felixcheung mentioned by read my hive generated parquet records. It's 
strange that Hive can read this PR's output correctly but this PR cannot read 
Hive's output correctly... I'll look into that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r23247492
  
--- Diff: docs/sql-programming-guide.md ---
@@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
   /td
 /tr
 tr
+  tdcodespark.sql.parquet.int96AsTimestamp/code/td
+  tdtrue/td
+  td
+Some Parquet-producing systems, in particular Impala, store Timestamp 
into INT96. Spark would also
+store Timestamp as INT96 because we need to avoid precision lost of 
the nanoseconds field. This
+flag tells Spark SQL to interpret INT96 data as a timestamp to provide 
compatibility with these systems.
+  /td
+/tr
--- End diff --

Oh, I see the difference here. Double checked, Parquet only provides 
TIMESTAMP and TIMESTAMP_MILLIS


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r23246274
  
--- Diff: docs/sql-programming-guide.md ---
@@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
   /td
 /tr
 tr
+  tdcodespark.sql.parquet.int96AsTimestamp/code/td
+  tdtrue/td
+  td
+Some Parquet-producing systems, in particular Impala, store Timestamp 
into INT96. Spark would also
+store Timestamp as INT96 because we need to avoid precision lost of 
the nanoseconds field. This
+flag tells Spark SQL to interpret INT96 data as a timestamp to provide 
compatibility with these systems.
+  /td
+/tr
--- End diff --

Do they?  As far as I can tell the parquet spec does not have a nanosecond
precision timestamp type.
On Jan 20, 2015 12:26 AM, Cheng Lian notificati...@github.com wrote:

 In docs/sql-programming-guide.md
 https://github.com/apache/spark/pull/3820#discussion-diff-23207993:

  @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
 /td
   /tr
   tr
  +  tdcodespark.sql.parquet.int96AsTimestamp/code/td
  +  tdtrue/td
  +  td
  +Some Parquet-producing systems, in particular Impala, store 
Timestamp into INT96. Spark would also
  +store Timestamp as INT96 because we need to avoid precision lost 
of the nanoseconds field. This
  +flag tells Spark SQL to interpret INT96 data as a timestamp to 
provide compatibility with these systems.
  +  /td
  +/tr

 I'm probably OK with this property. Just wanna ask doesn't Impala store
 original type information (TIMESTAMP) together in Parquet metainfo?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3820/files#r23207993.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-20 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r23247815
  
--- Diff: docs/sql-programming-guide.md ---
@@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
   /td
 /tr
 tr
+  tdcodespark.sql.parquet.int96AsTimestamp/code/td
+  tdtrue/td
+  td
+Some Parquet-producing systems, in particular Impala, store Timestamp 
into INT96. Spark would also
+store Timestamp as INT96 because we need to avoid precision lost of 
the nanoseconds field. This
+flag tells Spark SQL to interpret INT96 data as a timestamp to provide 
compatibility with these systems.
+  /td
+/tr
--- End diff --

Yeah, I agree that it's weird though.  Perhaps we should ask the parquet
list why they don't support the int 96 version.
On Jan 20, 2015 11:21 AM, Cheng Lian notificati...@github.com wrote:

 In docs/sql-programming-guide.md
 https://github.com/apache/spark/pull/3820#discussion-diff-23247492:

  @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the 
`setConf` method on SQLContext or
 /td
   /tr
   tr
  +  tdcodespark.sql.parquet.int96AsTimestamp/code/td
  +  tdtrue/td
  +  td
  +Some Parquet-producing systems, in particular Impala, store 
Timestamp into INT96. Spark would also
  +store Timestamp as INT96 because we need to avoid precision lost 
of the nanoseconds field. This
  +flag tells Spark SQL to interpret INT96 data as a timestamp to 
provide compatibility with these systems.
  +  /td
  +/tr

 Oh, I see the difference here. Double checked, Parquet only provides
 TIMESTAMP and TIMESTAMP_MILLIS

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3820/files#r23247492.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-18 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70450284
  
I've tested this PR but the result seems to be off.
Parquet generated from Hive with timestamp values set by 
'from_utc_timestamp('1970-01-01 08:00:00','PST')'

What I see with this PR:
scala t.take(10).foreach(println(_))
...
15/01/18 22:06:41 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: 
file:/users/x/parquetwithtimestamp start: 0 end: 25448 length: 25448 hosts: [] 
requestedSchema: message root {
  optional binary code (UTF8);
  optional binary description (UTF8);
  optional int32 total_emp;
  optional int32 salary;
  optional int96 timestamp;
}
 readSupportMetadata: 
{org.apache.spark.sql.parquet.row.metadata={type:struct,fields:[{name:code,type:string,nullable:true,metadata:{}},{name:description,type:string,nullable:true,metadata:{}},{name:total_emp,type:integer,nullable:true,metadata:{}},{name:salary,type:integer,nullable:true,metadata:{}},{name:timestamp,type:timestamp,nullable:true,metadata:{}}]},
 
org.apache.spark.sql.parquet.row.requested_schema={type:struct,fields:[{name:code,type:string,nullable:true,metadata:{}},{name:description,type:string,nullable:true,metadata:{}},{name:total_emp,type:integer,nullable:true,metadata:{}},{name:salary,type:integer,nullable:true,metadata:{}},{name:timestamp,type:timestamp,nullable:true,metadata:{}}]}}}
15/01/18 22:06:41 WARN ParquetRecordReader: Can not initialize counter due 
to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
15/01/18 22:06:41 INFO InternalParquetRecordReader: RecordReader 
initialized will read a total of 823 records.
15/01/18 22:06:41 INFO InternalParquetRecordReader: at row 0. reading next 
block
15/01/18 22:06:41 INFO CodecPool: Got brand-new decompressor [.snappy]
15/01/18 22:06:41 INFO InternalParquetRecordReader: block read in memory in 
27 ms. row count = 823
[00-,All Occupations,134354250,40690,1974-01-07 17:58:00.08896]
[11-,Management occupations,6003930,96150,1974-01-07 17:58:00.08896]

Expect: 1970-01-01 08:00:00

Actual: 1974-01-07 17:58:00.08896

Any idea?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70052102
  
  [Test build #25598 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25598/consoleFull)
 for   PR 3820 at commit 
[`a309058`](https://github.com/apache/spark/commit/a309058a618feb86813b4ac0df087e5447b37485).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-15 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70053453
  
Thanks Yin, I have fixed that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70061688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25598/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70061677
  
  [Test build #25598 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25598/consoleFull)
 for   PR 3820 at commit 
[`a309058`](https://github.com/apache/spark/commit/a309058a618feb86813b4ac0df087e5447b37485).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ExperimentalMethods protected[sql](sqlContext: SQLContext) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-12 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-69703883
  
Can you add unit tests? I tried it and got 
```
java.lang.RuntimeException: unable to convert datatype TimestampType in 
CatalystConverter
```
I think you need to update 
`org.apache.spark.sql.parquet.CatalystConverter#createConverter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-11 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22773648
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

I have read the documents of joda-time. The toJulianDayNumber API only 
valid since 2.2, but what we use in spark is 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-10 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22762553
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

Okay, I talked to @pwendell and we think it would be better to use Joda 
time if possible since spark already depends on that library in other 
subprojects.  What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-09 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22743372
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

Okay, you are right that its a bad idea to do this by hand.  Are there any 
dependencies that Spark SQL already has that could be used instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-69150651
  
  [Test build #25211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25211/consoleFull)
 for   PR 3820 at commit 
[`8526a33`](https://github.com/apache/spark/commit/8526a33d254fc4181c37b7e4b7976d1d48b8eaa7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-08 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22640517
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

Writing it by hand may be dangerous, because of leap seconds 
[https://en.wikipedia.org/wiki/Leap_second], as the specific date could be 
inaccurate.
And by check the 
pom[http://repo1.maven.org/maven2/org/jodd/jodd-core/3.5.2/jodd-core-3.5.2.pom] 
of jodd-core, there's no additional dependence, so the influence is 
comparatively small. Use jodd to covert also make everything consistent with 
hive, so we can be 100% compatible with those data generated by hive. So I'd 
prefer keep this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-08 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-69151953
  
The [NanoTime in 
Hive-0.14](https://github.com/apache/hive/blob/f64d707e4d59aafbcf7e9b82ed199f6781d97946/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java)
 is a little bit different from NanoTime in parquet-examples, and here are some 
[related 
discussions](https://issues.apache.org/jira/browse/HIVE-7263?focusedCommentId=14038502page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14038502).
 So I just rewrite `NanoTime` in hive into scala, instead of using NanoTime 
from parquet-examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-69160063
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25211/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-69160056
  
  [Test build #25211 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25211/consoleFull)
 for   PR 3820 at commit 
[`8526a33`](https://github.com/apache/spark/commit/8526a33d254fc4181c37b7e4b7976d1d48b8eaa7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-69003778
  
  [Test build #25161 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25161/consoleFull)
 for   PR 3820 at commit 
[`5cb8f97`](https://github.com/apache/spark/commit/5cb8f97da9317e5106a0f6fb4214a902f952a1ae).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-69013582
  
  [Test build #25161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25161/consoleFull)
 for   PR 3820 at commit 
[`5cb8f97`](https://github.com/apache/spark/commit/5cb8f97da9317e5106a0f6fb4214a902f952a1ae).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-69013592
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25161/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-07 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22636915
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -138,7 +139,13 @@ private[sql] trait SQLConf {
* When set to true, we always treat byte arrays in Parquet files as 
strings.
*/
   private[spark] def isParquetBinaryAsString: Boolean =
-getConf(PARQUET_BINARY_AS_STRING, false).toBoolean
+getConf(PARQUET_BINARY_AS_STRING, true).toBoolean
--- End diff --

Oh, this is a mistake... I'll modify it in next commit. Actually what I 
meant to change is the following value.
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-07 Thread scwf
Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22636339
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -138,7 +139,13 @@ private[sql] trait SQLConf {
* When set to true, we always treat byte arrays in Parquet files as 
strings.
*/
   private[spark] def isParquetBinaryAsString: Boolean =
-getConf(PARQUET_BINARY_AS_STRING, false).toBoolean
+getConf(PARQUET_BINARY_AS_STRING, true).toBoolean
--- End diff --

why this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22624670
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

I'm pretty hesitant to add a dependency here as they are very high cost for 
a project as big as Spark.  Is there any way to do this without adding this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-07 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22630330
  
--- Diff: sql/core/pom.xml ---
@@ -69,6 +69,11 @@
   version2.3.0/version
 /dependency
 dependency
+  groupIdorg.jodd/groupId
+  artifactIdjodd-core/artifactId
+  version${jodd.version}/version
+/dependency
--- End diff --

We can also convert to/from Julian by ourselves... I'll draft it,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68847248
  
  [Test build #25094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25094/consoleFull)
 for   PR 3820 at commit 
[`d4dbc8a`](https://github.com/apache/spark/commit/d4dbc8a36dee0c708a275d6c23865e5ae9dc8bb4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68847251
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25094/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68839674
  
  [Test build #25094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25094/consoleFull)
 for   PR 3820 at commit 
[`d4dbc8a`](https://github.com/apache/spark/commit/d4dbc8a36dee0c708a275d6c23865e5ae9dc8bb4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-06 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68839779
  
Oh sorry, I just checked Impala's configuration and I think it is not what 
it is here. I'll change my code to conform to that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-05 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68834477
  
Thanks for doing this, I've been getting a ton of requests for this feature!

Can you also add this to the sql programming guide?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22511350
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -141,6 +142,12 @@ private[sql] trait SQLConf {
 getConf(PARQUET_BINARY_AS_STRING, false).toBoolean
 
   /**
+   * When set to true, we always treat INT96Values in Parquet files as 
timestamp.
+   */
+  private[spark] def isParquetINT96AsTimestamp: Boolean =
+getConf(PARQUET_INT96_AS_TIMESTAMP, false).toBoolean
--- End diff --

We don't really use INT96 for anything else (and I don't think other 
systems do either?) so maybe this should be true by default?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68624719
  
  [Test build #25027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25027/consoleFull)
 for   PR 3820 at commit 
[`44d3ab1`](https://github.com/apache/spark/commit/44d3ab1f5bf1bea69a0e49921c0e3a295a387d67).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68626346
  
  [Test build #25027 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25027/consoleFull)
 for   PR 3820 at commit 
[`44d3ab1`](https://github.com/apache/spark/commit/44d3ab1f5bf1bea69a0e49921c0e3a295a387d67).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68626348
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25027/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-30 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68346947
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68347060
  
  [Test build #24889 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24889/consoleFull)
 for   PR 3820 at commit 
[`dc6eaba`](https://github.com/apache/spark/commit/dc6eaba7db957eb9038532c7c57282c040e870d4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68351142
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24889/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68351139
  
  [Test build #24889 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24889/consoleFull)
 for   PR 3820 at commit 
[`dc6eaba`](https://github.com/apache/spark/commit/dc6eaba7db957eb9038532c7c57282c040e870d4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22360301
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -84,7 +86,8 @@ private[parquet] class RowReadSupport extends 
ReadSupport[Row] with Logging {
 // TODO: Why it can be null?
 if (schema == null)  {
   log.debug(falling back to Parquet read schema)
-  schema = ParquetTypesConverter.convertToAttributes(parquetSchema, 
false)
+  schema = ParquetTypesConverter.convertToAttributes(
+parquetSchema, new SQLContext(new SparkContext))
--- End diff --

I don't think its safe to instantiate a SparkContext here as thats a pretty 
expensive operations and will throw exceptions if there is more than one in a 
single JVM.  We can try to refactor this in the future, but I'd just pass two 
options here (using named parameters for booleans).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68241728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24855/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68241724
  
  [Test build #24855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24855/consoleFull)
 for   PR 3820 at commit 
[`d44831a`](https://github.com/apache/spark/commit/d44831a2462b2c049b0222fbb7b8e08023d1f67c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68333255
  
  [Test build #24884 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24884/consoleFull)
 for   PR 3820 at commit 
[`dc6eaba`](https://github.com/apache/spark/commit/dc6eaba7db957eb9038532c7c57282c040e870d4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-29 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22339701
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -84,7 +86,8 @@ private[parquet] class RowReadSupport extends 
ReadSupport[Row] with Logging {
 // TODO: Why it can be null?
 if (schema == null)  {
   log.debug(falling back to Parquet read schema)
-  schema = ParquetTypesConverter.convertToAttributes(parquetSchema, 
false)
+  schema = ParquetTypesConverter.convertToAttributes(
+parquetSchema, new SQLContext(new SparkContext))
--- End diff --

The only thing used here inside this SQLContext is the 
`isParquetBinaryAsString` and `isParquetINT96AsTimestamp`. I'll add a comment 
here if necessary, to point this out clearly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68335232
  
  [Test build #24884 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24884/consoleFull)
 for   PR 3820 at commit 
[`dc6eaba`](https://github.com/apache/spark/commit/dc6eaba7db957eb9038532c7c57282c040e870d4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68335236
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24884/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-28 Thread adrian-wang
GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/3820

[SPARK-4987] [SQL] parquet timestamp type support



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark parquettimestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3820.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3820


commit d44831a2462b2c049b0222fbb7b8e08023d1f67c
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2014-12-29T07:41:13Z

parquet timestamp type support




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68238365
  
  [Test build #24855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24855/consoleFull)
 for   PR 3820 at commit 
[`d44831a`](https://github.com/apache/spark/commit/d44831a2462b2c049b0222fbb7b8e08023d1f67c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org