subject:"\[GitHub\] spark pull request\: \[SPARK\-4176\]\[WIP\] Support decimal types with p..."

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120858057
  
  [Test build #1055 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1055/console)
 for   PR 6796 at commit 
[`3e30bdf`](https://github.com/apache/spark/commit/3e30bdfb1199a105a882dde7d2dc0bd8edea05a2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121009125
  
  [Test build #37143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37143/console)
 for   PR 6796 at commit 
[`1703c26`](https://github.com/apache/spark/commit/1703c26f917c5e06f60bc9c8cd9299c9ffbb2389).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121009163
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121017435
  
  [Test build #37147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37147/consoleFull)
 for   PR 6796 at commit 
[`1dad677`](https://github.com/apache/spark/commit/1dad677449445c878d7d938192df4a6b2d997db4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121024334
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121016709
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121024313
  
  [Test build #37147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37147/console)
 for   PR 6796 at commit 
[`1dad677`](https://github.com/apache/spark/commit/1dad677449445c878d7d938192df4a6b2d997db4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121016736
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121021867
  
  [Test build #37146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37146/console)
 for   PR 6796 at commit 
[`83ca029`](https://github.com/apache/spark/commit/83ca029b2ec6e940f73acf9da0eae34319baeb6b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121021895
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121013563
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121013639
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121015081
  
  [Test build #37146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37146/consoleFull)
 for   PR 6796 at commit 
[`83ca029`](https://github.com/apache/spark/commit/83ca029b2ec6e940f73acf9da0eae34319baeb6b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121001152
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121001183
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-121002347
  
  [Test build #37143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37143/consoleFull)
 for   PR 6796 at commit 
[`1703c26`](https://github.com/apache/spark/commit/1703c26f917c5e06f60bc9c8cd9299c9ffbb2389).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120830562
  
  [Test build #1055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1055/consoleFull)
 for   PR 6796 at commit 
[`3e30bdf`](https://github.com/apache/spark/commit/3e30bdfb1199a105a882dde7d2dc0bd8edea05a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120840863
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120840849
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120841470
  
  [Test build #37130 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37130/consoleFull)
 for   PR 6796 at commit 
[`c8d4d6c`](https://github.com/apache/spark/commit/c8d4d6c9f0e420b2bd54e358b6b73f198ef3373e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120845430
  
  [Test build #37130 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37130/console)
 for   PR 6796 at commit 
[`c8d4d6c`](https://github.com/apache/spark/commit/c8d4d6c9f0e420b2bd54e358b6b73f198ef3373e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120845445
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120751119
  
  [Test build #37099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37099/console)
 for   PR 6796 at commit 
[`3e30bdf`](https://github.com/apache/spark/commit/3e30bdfb1199a105a882dde7d2dc0bd8edea05a2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120751126
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120740815
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120740808
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120741423
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120741421
  
  [Test build #37097 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37097/console)
 for   PR 6796 at commit 
[`1152721`](https://github.com/apache/spark/commit/1152721ebeafe7a4535e202c3091f415a8ba3863).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120749766
  
  [Test build #37099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37099/consoleFull)
 for   PR 6796 at commit 
[`3e30bdf`](https://github.com/apache/spark/commit/3e30bdfb1199a105a882dde7d2dc0bd8edea05a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120741354
  
  [Test build #37097 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37097/consoleFull)
 for   PR 6796 at commit 
[`1152721`](https://github.com/apache/spark/commit/1152721ebeafe7a4535e202c3091f415a8ba3863).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120749709
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-120749716
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119297130
  
  [Test build #36702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36702/consoleFull)
 for   PR 6796 at commit 
[`e6dad45`](https://github.com/apache/spark/commit/e6dad4574f47a7b6694500df2a1b86037c86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119298855
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119298914
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119297921
  
  [Test build #36702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36702/console)
 for   PR 6796 at commit 
[`e6dad45`](https://github.com/apache/spark/commit/e6dad4574f47a7b6694500df2a1b86037c86).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119297928
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-07-07 Thread rtreffer

Github user rtreffer commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119301151
  
The writeDecimal method is rather ugly, and the write path needs to know if 
we follow parquet style or not as this implies a different encoding (addInteger 
/ addLong).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119299539
  
  [Test build #36703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36703/consoleFull)
 for   PR 6796 at commit 
[`7a57c16`](https://github.com/apache/spark/commit/7a57c163ec3fe516d3b173042329a9b6b135efa9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119296469
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119296488
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119329021
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119328979
  
  [Test build #36703 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36703/console)
 for   PR 6796 at commit 
[`7a57c16`](https://github.com/apache/spark/commit/7a57c163ec3fe516d3b173042329a9b6b135efa9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-07-07 Thread rtreffer

Github user rtreffer commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-119170768
  
Hi @liancheng,

I'm rebasing on you PR right now. I can work for ~1-2h / day on this PR so 
feel free to take over the PR if this blocks anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-07-05 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-118730215
  
Hey @rtreffer, just want to make sure whether you are still working on 
this? I'm asking because I just opened #7231 to refactor Parquet read path for 
interoperability and backwards-compatibility, which also touches the decimal 
parts. I believe the new [`CatalystDecimalConverter`] [1] already covers the 
read path of decimals with precision  18, which means this PR can be further 
simplified. Just in case you don't have time to continue this PR, I'm happy to 
fork your branch and get it merged (will still list you as the main author).

[1]: 
https://github.com/apache/spark/pull/7231/files#diff-1d6c363c04155a9328fe1f5bd08a2f90R237


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33487350
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -383,20 +386,14 @@ private[parquet] class CatalystSchemaConverter(
   .length(minBytesForPrecision(precision))
   .named(field.name)
 
-  case dec @ DecimalType() if !followParquetFormatSpec =
-throw new AnalysisException(
-  sData type $dec is not supported.  +
-sWhen ${SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC.key} is 
set to false, +
-decimal precision and scale must be specified,  +
-and precision must be less than or equal to 18.)
-
--- End diff --

(Please see my comments [here] [1].)

[1]: 
https://github.com/apache/spark/pull/6796/files#diff-83ef4d5f1029c8bebb49a0c139fa3154R301


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33487385
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -169,11 +169,12 @@ private[parquet] class CatalystSchemaConverter(
 }
 
   case INT96 =
-CatalystSchemaConverter.analysisRequire(
-  assumeInt96IsTimestamp,
-  INT96 is not supported unless it's interpreted as timestamp.  +
-sPlease try to set ${SQLConf.PARQUET_INT96_AS_TIMESTAMP.key} 
to true.)
-TimestampType
+field.getOriginalType match {
+  case DECIMAL = makeDecimalType(maxPrecisionForBytes(12))
+  case _ if assumeInt96IsTimestamp = TimestampType
+  case null = makeDecimalType(maxPrecisionForBytes(12))
+  case _ = illegalType()
+}
--- End diff --

Yeah, it's not mentioned anywhere, just got this information from Parquet 
dev mailing list :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33486747
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
@@ -43,16 +43,27 @@ private[parquet] object ParquetTypesConverter extends 
Logging {
   }
 
   /**
-   * Compute the FIXED_LEN_BYTE_ARRAY length needed to represent a given 
DECIMAL precision.
+   * BYTES_FOR_PRECISION computes the required bytes to store a value of a 
certain decimal
+   * precision.
*/
-  private[parquet] val BYTES_FOR_PRECISION = Array.tabulate[Int](38) { 
precision =
-var length = 1
+  private[parquet] def BYTES_FOR_PRECISION_COMPUTE(precision : Int) : Int 
= {
+var length = (precision / math.log10(2) - 1).toInt / 8
 while (math.pow(2.0, 8 * length - 1)  math.pow(10.0, precision)) {
   length += 1
 }
 length
   }
 
+  private[parquet] def BYTES_FOR_PRECISION_STATIC =
--- End diff --

Prefer `bytesForPrecisionStatic`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33487239
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -369,9 +371,6 @@ private[parquet] class MutableRowWriteSupport extends 
RowWriteSupport {
   case DateType = writer.addInteger(record.getInt(index))
   case TimestampType = writeTimestamp(record.getLong(index))
   case d: DecimalType =
-if (d.precisionInfo == None || d.precisionInfo.get.precision  18) 
{
-  sys.error(sUnsupported datatype $d, cannot write to consumer)
-}
--- End diff --

Had an offline discussion with @yhuai but forgot to post a summary here: at 
last we decided not to convert unlimited decimals to `decimal(10, 0)` 
implicitly in #6617, because firstly we need to confirm all other parts works 
in a consistent way, which might introduce unexpected complexity in #6617, and 
secondly implicit conversion can often become a huge footgun. So let's still 
report an error in case of `d.precisionInfo == None` (but please throw an 
`AnalysisException` instead of using `sys.error`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33486665
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
@@ -43,16 +43,27 @@ private[parquet] object ParquetTypesConverter extends 
Logging {
   }
 
   /**
-   * Compute the FIXED_LEN_BYTE_ARRAY length needed to represent a given 
DECIMAL precision.
+   * BYTES_FOR_PRECISION computes the required bytes to store a value of a 
certain decimal
+   * precision.
*/
-  private[parquet] val BYTES_FOR_PRECISION = Array.tabulate[Int](38) { 
precision =
-var length = 1
+  private[parquet] def BYTES_FOR_PRECISION_COMPUTE(precision : Int) : Int 
= {
+var length = (precision / math.log10(2) - 1).toInt / 8
 while (math.pow(2.0, 8 * length - 1)  math.pow(10.0, precision)) {
   length += 1
 }
 length
   }
 
+  private[parquet] def BYTES_FOR_PRECISION_STATIC =
+(0 to 30).map(BYTES_FOR_PRECISION_COMPUTE).toArray
--- End diff --

30 should probably be replaced with 38, which fits in 16 bytes, and is the 
maximum precision supported in Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33486716
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
@@ -43,16 +43,27 @@ private[parquet] object ParquetTypesConverter extends 
Logging {
   }
 
   /**
-   * Compute the FIXED_LEN_BYTE_ARRAY length needed to represent a given 
DECIMAL precision.
+   * BYTES_FOR_PRECISION computes the required bytes to store a value of a 
certain decimal
+   * precision.
*/
-  private[parquet] val BYTES_FOR_PRECISION = Array.tabulate[Int](38) { 
precision =
-var length = 1
+  private[parquet] def BYTES_FOR_PRECISION_COMPUTE(precision : Int) : Int 
= {
--- End diff --

Prefer `bytesForPrecision` since it's a method instead of a constant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-116766442
  
Hi @rtreffer,

 - How is the compatibility mode intended to work? Settings are currently 
private, but I'd like to store Decimal(19), so is lifting the 18 limit correct 
for compatibility mode?

The compatibility mode is enabled by setting 
`spark.sql.parquet.followParquetFormatSpec` to `false`.  This mode must be 
enabled for now, because the write path hasn't been refactored to follow the 
Parquet format spec.  Note that compatibility mode only affects the write path, 
because the Parquet format spec also covers legacy formats by various 
backwards-compatibilty rules.

Decimals with precision  18 could be enabled even in compatibility mode.  
Because it doesn't affect compatibility: old Spark versions can't read decimals 
with precision  18 from the very beginning.

What do you mean by saying settings are currently private?  
`SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC` is `private[spark]`, all classes 
under `org.apache.spark` can access it.

 - INT32/INT64 are only used when the byte length matches the byte length 
for the precision. FIXED_LEN_BYTE_ARRAY will thus e.g. be used to store 6 byte 
values

I see your point.  You mentioned a debate in [this comment] [1], were you 
referring to [this one] [2]?  From the perspective of storage efficiency, it 
probably makes sense.  (I said probably because I'm not quite sure about the 
average case after taking encoding/compression into consideration.)  However, 
in the case of Parquet, we usually care more about speed and memory 
consumption.  Especially, Parquet can be super memory consuming when reading 
files with wide schema (i.e., large column number).  A key advantage of `INT32` 
and `INT64` is that, they avoid boxing costs in many cases and thus can be 
faster and use less memory.  Also, you don't need to do all those bit 
operations to encode/decode the unscaled long value of a decimal when using 
`INT32` and `INT64`.

At the meantime, Parquet handles `INT32` and `INT64` pretty efficiently.  
There are more encoders for integral types than binaries (either fixed-length 
or not, see [Encodings.md] [3] for more details).  Although I haven't done 
benchmark for this, but I believe in many cases, storage efficiency of `INT32` 
can be comparable or even better than `FIXED_LEN_BYTE_ARRAY` with a length less 
than 4. The same should also applies to `INT64`.

So I suggest: when compatibility mode is off, we just use `INT32` for 1 = 
precision = 9, and `INT64` for 10 = precision = 18 when converting 
`DecimalType`s in `CatalystSchemaConverter`.  When we refactor the write path 
to follow Parquet format spec, we can write decimals in `INT32` and `INT64` 
when appropriate in follow-up PRs.

The TL;DR is: I'd just remove `precision = maxPrecisionForBytes(8)` in 
[this line] [4] and leave everything else unmodified (you comment updates looks 
good to me though :)

 - FIXED_LEN_BYTE_ARRAY means I'll have to create an array of the correct 
size. I've increased the scratch_bytes. Not very happy about the code path, do 
you have better ideas?

Hive limits the max precision of a decimal to 38, which fits in 16 bytes.  
So 16 rather than 4096 bytes should be enough for most cases.  Also it would be 
better to refactor branches of [this `if` expression] [5] into two separate 
methods for clarity.  Otherwise it looks good.

 - BYTES_FOR_PRECISION needs to handle any precision. I've reworked that 
code. Again, suggestions welcome

(See my other comments inlined.)

[1]: https://github.com/apache/spark/pull/6796#discussion_r33420742
[2]: https://github.com/apache/spark/pull/6796#discussion_r32891515
[3]: https://github.com/Parquet/parquet-format/blob/master/Encodings.md
[4]: 
https://github.com/apache/spark/pull/6796/files#diff-a4c01298c63223d113645a31c01141baL377
[5]: 
https://github.com/apache/spark/pull/6796/files#diff-83ef4d5f1029c8bebb49a0c139fa3154R301



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-29 Thread rtreffer

Github user rtreffer commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-116769197
  
Hi @liancheng, thank you for the thorough review, will push a reworked 
version soon. Everything sounds reasonable :-)

With private Settings I meant that I can't change the setting in the 
shell because it's marked as isPublic = false in 
https://github.com/liancheng/spark/blob/2a2062d3f530ecd26e75b306aee42761d67d8724/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala#L273

I'm not sure if that's intended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33487973
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -383,20 +386,14 @@ private[parquet] class CatalystSchemaConverter(
   .length(minBytesForPrecision(precision))
   .named(field.name)
 
-  case dec @ DecimalType() if !followParquetFormatSpec =
-throw new AnalysisException(
-  sData type $dec is not supported.  +
-sWhen ${SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC.key} is 
set to false, +
-decimal precision and scale must be specified,  +
-and precision must be less than or equal to 18.)
-
   // =
   // Decimals (follow Parquet format spec)
   // =
 
-  // Uses INT32 for 1 = precision = 9
+  // Uses INT32 for 4 byte encodings / precision = 9
   case DecimalType.Fixed(precision, scale)
-if precision = maxPrecisionForBytes(4)  followParquetFormatSpec 
=
+if followParquetFormatSpec  maxPrecisionForBytes(3)  precision 
 
+  precision = maxPrecisionForBytes(4) =
--- End diff --

(Please see my comment [here] [1].)

[1]: https://github.com/apache/spark/pull/6796#issuecomment-116766442


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-116879298
  
@rtreffer Yeah, it's intended. As explained above, this feature flag must 
be set to `false` for now because the write path hasn't been refactored to 
respect the Parquet format spec. If we turn this on, `CatalystSchemaConverter` 
will generate standard Parquet schema while the write path still writes data 
conforming to the old legacy format, which leads to data corruption.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -383,20 +386,14 @@ private[parquet] class CatalystSchemaConverter(
   .length(minBytesForPrecision(precision))
   .named(field.name)
 
-  case dec @ DecimalType() if !followParquetFormatSpec =
-throw new AnalysisException(
-  sData type $dec is not supported.  +
-sWhen ${SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC.key} is 
set to false, +
-decimal precision and scale must be specified,  +
-and precision must be less than or equal to 18.)
-
--- End diff --

We still need this branch to handle the case where precision information is 
missing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420649
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -404,9 +401,10 @@ private[parquet] class CatalystSchemaConverter(
   .scale(scale)
   .named(field.name)
 
-  // Uses INT64 for 1 = precision = 18
+  // Uses INT64 for 8 byte encodings / precision = 18
   case DecimalType.Fixed(precision, scale)
-if precision = maxPrecisionForBytes(8)  followParquetFormatSpec 
=
+if followParquetFormatSpec  maxPrecisionForBytes(7)  precision 

+  precision = maxPrecisionForBytes(8) =
--- End diff --

Same question as above...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420652
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -562,4 +560,5 @@ private[parquet] object CatalystSchemaConverter {
   throw new AnalysisException(message)
 }
   }
+
--- End diff --

Nit: Remove this newline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420661
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -212,10 +212,7 @@ private[parquet] class RowWriteSupport extends 
WriteSupport[InternalRow] with Lo
 case BooleanType = writer.addBoolean(value.asInstanceOf[Boolean])
 case DateType = writer.addInteger(value.asInstanceOf[Int])
 case d: DecimalType =
-  if (d.precisionInfo == None || d.precisionInfo.get.precision  
18) {
-sys.error(sUnsupported datatype $d, cannot write to consumer)
-  }
-  writeDecimal(value.asInstanceOf[Decimal], 
d.precisionInfo.get.precision)
+  writeDecimal(value.asInstanceOf[Decimal], 
d.precisionInfo.map(_.precision).getOrElse(10))
--- End diff --

Need to report error for `DecimalType(None)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-28 Thread rtreffer

Github user rtreffer commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420742
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -383,20 +386,14 @@ private[parquet] class CatalystSchemaConverter(
   .length(minBytesForPrecision(precision))
   .named(field.name)
 
-  case dec @ DecimalType() if !followParquetFormatSpec =
-throw new AnalysisException(
-  sData type $dec is not supported.  +
-sWhen ${SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC.key} is 
set to false, +
-decimal precision and scale must be specified,  +
-and precision must be less than or equal to 18.)
-
   // =
   // Decimals (follow Parquet format spec)
   // =
 
-  // Uses INT32 for 1 = precision = 9
+  // Uses INT32 for 4 byte encodings / precision = 9
   case DecimalType.Fixed(precision, scale)
-if precision = maxPrecisionForBytes(4)  followParquetFormatSpec 
=
+if followParquetFormatSpec  maxPrecisionForBytes(3)  precision 
 
+  precision = maxPrecisionForBytes(4) =
--- End diff --

We had a debate about using the most compact storage type if possible.

As such INT32 looses compared to a 3 byte fixed length array.

Am 28. Juni 2015 10:59:15 MESZ, schrieb Cheng Lian 
notificati...@github.com:
case DecimalType.Fixed(precision, scale)
 -if precision = maxPrecisionForBytes(4) 
followParquetFormatSpec =
 +if followParquetFormatSpec  maxPrecisionForBytes(3) 
precision  
 +  precision = maxPrecisionForBytes(4) =

Why do we want `maxPrecisionForBytes(3)  precision` here? Did I miss
something?

---
Reply to this email directly or view it on GitHub:
https://github.com/apache/spark/pull/6796/files#r33420647

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420609
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -169,11 +169,12 @@ private[parquet] class CatalystSchemaConverter(
 }
 
   case INT96 =
-CatalystSchemaConverter.analysisRequire(
-  assumeInt96IsTimestamp,
-  INT96 is not supported unless it's interpreted as timestamp.  +
-sPlease try to set ${SQLConf.PARQUET_INT96_AS_TIMESTAMP.key} 
to true.)
-TimestampType
+field.getOriginalType match {
+  case DECIMAL = makeDecimalType(maxPrecisionForBytes(12))
+  case _ if assumeInt96IsTimestamp = TimestampType
+  case null = makeDecimalType(maxPrecisionForBytes(12))
+  case _ = illegalType()
+}
--- End diff --

`INT96` is only used for nanosecond timestamp types for historical reasons, 
and is to be deprecated. Let's not use it for decimals.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420628
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -373,8 +374,10 @@ private[parquet] class CatalystSchemaConverter(
 
   // Spark 1.4.x and prior versions only support decimals with a 
maximum precision of 18 and
   // always store decimals in fixed-length byte arrays.
+  // Always storing FIXED_LEN_BYTE_ARRAY is thus compatible with spark 
= 1.4.x, except for
+  // precisions  18.
   case DecimalType.Fixed(precision, scale)
-if precision = maxPrecisionForBytes(8)  
!followParquetFormatSpec =
+if !followParquetFormatSpec =
--- End diff --

Nit: Let's join this line and the line above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420647
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -383,20 +386,14 @@ private[parquet] class CatalystSchemaConverter(
   .length(minBytesForPrecision(precision))
   .named(field.name)
 
-  case dec @ DecimalType() if !followParquetFormatSpec =
-throw new AnalysisException(
-  sData type $dec is not supported.  +
-sWhen ${SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC.key} is 
set to false, +
-decimal precision and scale must be specified,  +
-and precision must be less than or equal to 18.)
-
   // =
   // Decimals (follow Parquet format spec)
   // =
 
-  // Uses INT32 for 1 = precision = 9
+  // Uses INT32 for 4 byte encodings / precision = 9
   case DecimalType.Fixed(precision, scale)
-if precision = maxPrecisionForBytes(4)  followParquetFormatSpec 
=
+if followParquetFormatSpec  maxPrecisionForBytes(3)  precision 
 
+  precision = maxPrecisionForBytes(4) =
--- End diff --

Why do we want `maxPrecisionForBytes(3)  precision` here? Did I miss 
something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-28 Thread rtreffer

Github user rtreffer commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420714
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -383,20 +386,14 @@ private[parquet] class CatalystSchemaConverter(
   .length(minBytesForPrecision(precision))
   .named(field.name)
 
-  case dec @ DecimalType() if !followParquetFormatSpec =
-throw new AnalysisException(
-  sData type $dec is not supported.  +
-sWhen ${SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC.key} is 
set to false, +
-decimal precision and scale must be specified,  +
-and precision must be less than or equal to 18.)
-
--- End diff --

You said it should usevthe hive default of (10,0) - or did I misinterpret 
that?

Am 28. Juni 2015 10:53:00 MESZ, schrieb Cheng Lian 
notificati...@github.com:
 @@ -383,20 +386,14 @@ private[parquet] class CatalystSchemaConverter(
.length(minBytesForPrecision(precision))
.named(field.name)
  
 -  case dec @ DecimalType() if !followParquetFormatSpec =
 -throw new AnalysisException(
 -  sData type $dec is not supported.  +
 -sWhen ${SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC.key}
is set to false, +
 -decimal precision and scale must be specified,  +
 -and precision must be less than or equal to 18.)
 -

We still need this branch to handle the case where precision
information is missing.

---
Reply to this email directly or view it on GitHub:
https://github.com/apache/spark/pull/6796/files#r33420570

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-116231004
  
@rtreffer Thanks for rebasing and simplifying this! I left some comments 
but haven't finished my review, will be back after confirming some details 
related to your questions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-28 Thread rtreffer

Github user rtreffer commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33420719
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/CatalystSchemaConverter.scala
 ---
@@ -169,11 +169,12 @@ private[parquet] class CatalystSchemaConverter(
 }
 
   case INT96 =
-CatalystSchemaConverter.analysisRequire(
-  assumeInt96IsTimestamp,
-  INT96 is not supported unless it's interpreted as timestamp.  +
-sPlease try to set ${SQLConf.PARQUET_INT96_AS_TIMESTAMP.key} 
to true.)
-TimestampType
+field.getOriginalType match {
+  case DECIMAL = makeDecimalType(maxPrecisionForBytes(12))
+  case _ if assumeInt96IsTimestamp = TimestampType
+  case null = makeDecimalType(maxPrecisionForBytes(12))
+  case _ = illegalType()
+}
--- End diff --

Didn't know about the deprecation, will drop it.

Am 28. Juni 2015 10:56:00 MESZ, schrieb Cheng Lian 
notificati...@github.com:
 @@ -169,11 +169,12 @@ private[parquet] class CatalystSchemaConverter(
  }
  
case INT96 =
 -CatalystSchemaConverter.analysisRequire(
 -  assumeInt96IsTimestamp,
 -  INT96 is not supported unless it's interpreted as
timestamp.  +
 -sPlease try to set
${SQLConf.PARQUET_INT96_AS_TIMESTAMP.key} to true.)
 -TimestampType
 +field.getOriginalType match {
 +  case DECIMAL = makeDecimalType(maxPrecisionForBytes(12))
 +  case _ if assumeInt96IsTimestamp = TimestampType
 +  case null = makeDecimalType(maxPrecisionForBytes(12))
 +  case _ = illegalType()
 +}

`INT96` is only used for nanosecond timestamp types for historical
reasons, and is to be deprecated. Let's not use it for decimals.

---
Reply to this email directly or view it on GitHub:
https://github.com/apache/spark/pull/6796/files#r33420609

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-115619981
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-115619879
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-115620346
  
  [Test build #35855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35855/consoleFull)
 for   PR 6796 at commit 
[`5fe321e`](https://github.com/apache/spark/commit/5fe321ee027570eea49869bcbe80c55246538229).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-115621136
  
  [Test build #35855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35855/console)
 for   PR 6796 at commit 
[`5fe321e`](https://github.com/apache/spark/commit/5fe321ee027570eea49869bcbe80c55246538229).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-115621149
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-26 Thread rtreffer

Github user rtreffer commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-115620818
  
@liancheng it starts to work (compiles and minimal initial test worked, no 
guarantees). I think there are some points that need feedback
- How is the compatibility mode intended to work? Settings are currently 
private, but I'd like to store Decimal(19), so is lifting the 18 limit correct 
for compatibility mode?
- INT32/INT64 are only used when the byte length matches the byte length 
for the precision. FIXED_LEN_BYTE_ARRAY will thus e.g. be used to store 6 byte 
values
- FIXED_LEN_BYTE_ARRAY means I'll have to create an array of the correct 
size. I've increased the scratch_bytes. Not very happy about the code path, do 
you have better ideas?
- BYTES_FOR_PRECISION needs to handle any precision. I've reworked that 
code. Again, suggestions welcome

The patch is now way smaller and less intrusive. Looks like the refactoring 
was well worth the effort!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-25 Thread sujkh85

Github user sujkh85 commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-115389042
  

NAVER - http://www.naver.com/


su...@naver.com ëê» ë³´ë´ì  ë©ì¼ Re: [spark] [SPARK-4176][WIP] 
Support decimal types with precision  18 in parquet (#6796) ì´ ë¤ìê³¼ 
ê°ì ì´ì ë¡ ì ì¡ ì¤í¨íìµëë¤.



ë°ë ì¬ëì´ íìëì ë©ì¼ì ìì ì°¨ë¨ íììµëë¤. 






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

2015-06-25 Thread rtreffer

Github user rtreffer commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-11533
  
Currently reworking the patch.

Here is the warning about the tuple match
```
[warn] 
/home/rtreffer/work/spark-master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala:334:
 object Fixed expects 2 patterns to hold (Int, Int) but crushing into 2-tuple 
to fit single pattern (SI-6675)
```
According to the ticket it's a deprecation warning.
https://issues.scala-lang.org/browse/SI-6675

Nothing urgent, but I think it should be fixed at some point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user rtreffer commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33124241
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -369,9 +371,6 @@ private[parquet] class MutableRowWriteSupport extends 
RowWriteSupport {
   case DateType = writer.addInteger(record.getInt(index))
   case TimestampType = writeTimestamp(record.getLong(index))
   case d: DecimalType =
-if (d.precisionInfo == None || d.precisionInfo.get.precision  18) 
{
-  sys.error(sUnsupported datatype $d, cannot write to consumer)
-}
--- End diff --

Overseen, bug. We can't serialize without that info, parquet requires it 
(and mixed scale would be complicated).

PS: do you know if there is any interest on allowing mixed Decimal in 
parquet?

Am 24. Juni 2015 09:36:11 MESZ, schrieb Cheng Lian 
notificati...@github.com:
 @@ -369,9 +371,6 @@ private[parquet] class MutableRowWriteSupport
extends RowWriteSupport {
case DateType = writer.addInteger(record.getInt(index))
case TimestampType = writeTimestamp(record.getLong(index))
case d: DecimalType =
 -if (d.precisionInfo == None || d.precisionInfo.get.precision
 18) {
 -  sys.error(sUnsupported datatype $d, cannot write to
consumer)
 -}

Don't we need to consider the case where `d.precisionInfo == None` now?

---
Reply to this email directly or view it on GitHub:
https://github.com/apache/spark/pull/6796/files#r33123740

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33123740
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -369,9 +371,6 @@ private[parquet] class MutableRowWriteSupport extends 
RowWriteSupport {
   case DateType = writer.addInteger(record.getInt(index))
   case TimestampType = writeTimestamp(record.getLong(index))
   case d: DecimalType =
-if (d.precisionInfo == None || d.precisionInfo.get.precision  18) 
{
-  sys.error(sUnsupported datatype $d, cannot write to consumer)
-}
--- End diff --

Don't we need to consider the case where `d.precisionInfo == None` now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user rtreffer commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33129732
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala 
---
@@ -331,7 +331,7 @@ private[sql] class JDBCRDD(
   case BooleanType = BooleanConversion
   case DateType = DateConversion
   case DecimalType.Unlimited = DecimalConversion(None)
-  case DecimalType.Fixed(d) = DecimalConversion(Some(d))
+  case DecimalType.Fixed(d, s) = DecimalConversion(Some((d, s)))
--- End diff --

As said it was only about a warning, not about correctness. I'll drop this 
change on the next version, it draws too much attention and is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33123578
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
@@ -289,7 +295,13 @@ private[parquet] object ParquetTypesConverter extends 
Logging {
   name: String,
   nullable: Boolean = true,
   inArray: Boolean = false,
+  parquetSchema: Option[ParquetType] = None,
   toThriftSchemaNames: Boolean = false): ParquetType = {
+
+val parquetElementTypeBySchema = parquetSchema.collect {
+case gType : ParquetGroupType if (gType.containsField(name)) = 
gType.getType(name)
--- End diff --

Nit: Remove the space before `:`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33128740
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetIOSuite.scala ---
@@ -108,7 +108,7 @@ class ParquetIOSuiteBase extends QueryTest with 
ParquetTest {
 // Parquet doesn't allow column names with spaces, have to add an 
alias here
 .select($_1 cast decimal as dec)
 
-for ((precision, scale) - Seq((5, 2), (1, 0), (1, 1), (18, 10), (18, 
17))) {
+for ((precision, scale) - Seq((5, 2), (1, 0), (1, 1), (18, 10), (18, 
17), (60, 5))) {
--- End diff --

It would be good to add one more edge case here, namely `(19, n)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33119772
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
@@ -229,11 +231,15 @@ private[parquet] object ParquetTypesConverter extends 
Logging {
 case LongType = Some(ParquetTypeInfo(ParquetPrimitiveTypeName.INT64))
 case TimestampType = 
Some(ParquetTypeInfo(ParquetPrimitiveTypeName.INT96))
 case DecimalType.Fixed(precision, scale) if precision = 18 =
-  // TODO: for now, our writer only supports decimals that fit in a 
Long
   Some(ParquetTypeInfo(ParquetPrimitiveTypeName.FIXED_LEN_BYTE_ARRAY,
--- End diff --

Using int32 and int64 makes encoding and decoding faster since they don't 
introduce boxing costs. But I agree that should be made in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user rtreffer commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33133591
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
@@ -229,11 +231,15 @@ private[parquet] object ParquetTypesConverter extends 
Logging {
 case LongType = Some(ParquetTypeInfo(ParquetPrimitiveTypeName.INT64))
 case TimestampType = 
Some(ParquetTypeInfo(ParquetPrimitiveTypeName.INT96))
 case DecimalType.Fixed(precision, scale) if precision = 18 =
-  // TODO: for now, our writer only supports decimals that fit in a 
Long
   Some(ParquetTypeInfo(ParquetPrimitiveTypeName.FIXED_LEN_BYTE_ARRAY,
 Some(ParquetOriginalType.DECIMAL),
 Some(new DecimalMetadata(precision, scale)),
 Some(BYTES_FOR_PRECISION(precision
+case DecimalType.Fixed(precision, scale) =
+  Some(ParquetTypeInfo(ParquetPrimitiveTypeName.BINARY,
--- End diff --

Under the assumption that all values will use the full length, yes.

But at some point the overhead of the length is low compared to the 
overhead if someone specifies just the upper bound of values.
I have to check if it really uses 4 bytes for BINARY. I'd then raise the 
threshold to ~40 bytes length. (meaning =10% worst case overhead before 
compression)

It won't simplify the decoding/writing though, because the =18 case  is 
used for long decoding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33132365
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
@@ -229,11 +231,15 @@ private[parquet] object ParquetTypesConverter extends 
Logging {
 case LongType = Some(ParquetTypeInfo(ParquetPrimitiveTypeName.INT64))
 case TimestampType = 
Some(ParquetTypeInfo(ParquetPrimitiveTypeName.INT96))
 case DecimalType.Fixed(precision, scale) if precision = 18 =
-  // TODO: for now, our writer only supports decimals that fit in a 
Long
   Some(ParquetTypeInfo(ParquetPrimitiveTypeName.FIXED_LEN_BYTE_ARRAY,
 Some(ParquetOriginalType.DECIMAL),
 Some(new DecimalMetadata(precision, scale)),
 Some(BYTES_FOR_PRECISION(precision
+case DecimalType.Fixed(precision, scale) =
+  Some(ParquetTypeInfo(ParquetPrimitiveTypeName.BINARY,
--- End diff --

Using `BINARY` here conforms to Parquet format spec. But according to the 
spec, `FIXED_LENGTH_BYTE_ARRAY` with different length can also be used to store 
decimals with different precisions. From the perspective of storage efficiency, 
`FIXED_LENGTH_BYTE_ARRAY` is probably more preferable, since `BINARY` has 
variable length and needs 4 extra bytes to encode the length (before being 
encoded and compressed).

Another benefit here is that we can just unify cases for precision = 18 
and precision  18.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user rtreffer commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-114835713

@liancheng I'll rebase on your branch, I really like the way you cleaned up
toPrimitiveDataType by using a fluent Types interface. This will make this
patch way easier.

Talking about testing/compatibility/interoperability, I have added a
hive-generated parquet file that I'd like to turn into a test case:

https://github.com/rtreffer/spark/tree/spark-4176-store-large-decimal-in-parquet/sql/core/src/test/resources/hive-decimal-parquet

There are some parquet files attached to tickets in jira, too.
Do you plan to convert those into tests?

Regarding FIXED_LENGTH_BYTE_ARRAY The overhead would decreases compared
to size. BINARY overhead would be 10% from ~DECIMAL(100) and 25% from
~DECIAL(40) (pre-compression). I'd expect DECIMAL(40) to use the full precision
only from time to time. But yeah, I've overlooked the 4 byte overhead at
https://github.com/Parquet/parquet-format/blob/master/Encodings.md and assumed
it would be less, FIXED_LENGTH_BYTE_ARRAY should be good for now (until s.o.
complains).

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-114810864
  
@rtreffer I'm working on improving compatibility and interoperability of 
Spark SQL's Parquet support. The first part is #6617, where I refactored the 
scheme conversion code so that now we can stick to the most recent Parquet 
format spec (`ParquetTypes.scala` replaced with 
`CatalystSchemaConverter.scala`). Schema conversion part of the decimal 
precision problem is also handled there ([1] [1], [2] [2]). Would you mind if I 
merge that one and then you rebase this PR? I think it would be much easier to 
work with. Basically you only need to:

1. Remove the `precision = ...` part in [this line] [3], and
1. Always use `FIXED_LENGTH_BYTE_ARRAY` to store decimals

[1]: 
https://github.com/apache/spark/pull/6617/files#diff-a4c01298c63223d113645a31c01141baR370
[2]: 
https://github.com/apache/spark/pull/6617/files#diff-a4c01298c63223d113645a31c01141baR118
[3]: 
https://github.com/apache/spark/pull/6617/files#diff-a4c01298c63223d113645a31c01141baR377


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6796#discussion_r33133479
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -369,9 +371,6 @@ private[parquet] class MutableRowWriteSupport extends 
RowWriteSupport {
   case DateType = writer.addInteger(record.getInt(index))
   case TimestampType = writeTimestamp(record.getLong(index))
   case d: DecimalType =
-if (d.precisionInfo == None || d.precisionInfo.get.precision  18) 
{
-  sys.error(sUnsupported datatype $d, cannot write to consumer)
-}
--- End diff --

I don't think Parquet allow mixed decimal precision. However, Hive uses a 
default precision = 10 and a default scale = 0 when precision/scale information 
is missing. I also did the same thing in #6617.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-115025231
  
@rtreffer I've merged #6617.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176][WIP] Support decimal types with p...