[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-09 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12750 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-09 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-217974955 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-217966399 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-217966396 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-217966086 **[Test build #58151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58151/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-217942314 **[Test build #58151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58151/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216396595 LGTM, just one minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61821978 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -246,12 +260,40 @@ private[sql] object

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216376862 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216376855 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216376350 **[Test build #57557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57557/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216354301 **[Test build #57557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57557/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216353667 ;retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-05-02 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216353700 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216014587 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216014586 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216014548 **[Test build #57461 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57461/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216010060 **[Test build #57461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57461/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216008877 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-216001074 It looks like the failing test recently became flaky in the master branch

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215949797 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215949786 **[Test build #57433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57433/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215949796 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215948577 **[Test build #57433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57433/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215948518 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215947085 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215947083 **[Test build #57429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57429/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215947086 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215946395 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215946397 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215946369 **[Test build #57421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57421/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215945856 **[Test build #57429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57429/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215945786 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215945382 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215945383 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215945376 **[Test build #57425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57425/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215943950 **[Test build #57425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57425/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61666430 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala --- @@ -85,17 +85,18 @@ sealed class Metadata private[types]

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61666426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala --- @@ -85,17 +85,18 @@ sealed class Metadata private[types]

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61666410 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -246,12 +260,40 @@ private[sql] object

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61666404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -95,7 +95,8 @@ object HiveTypeCoercion {

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61666400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -103,6 +103,17 @@ case class StructType(fields:

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61666385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala --- @@ -85,17 +85,18 @@ sealed class Metadata private[types]

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215943346 **[Test build #57421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57421/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215943176 After a bit more time in a profiler, I was able to make this 2x faster than my previous best time. To give a rough idea of the structure of my benchmark:

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215941197 @NathanHowell, I played around with your code from https://github.com/apache/spark/pull/12750#issuecomment-215607825 and it was a bit slower than mine because it

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61665637 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -76,6 +78,15 @@ private[sql] object

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61665640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -246,12 +263,39 @@ private[sql] object

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-28 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61526935 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -246,12 +263,39 @@ private[sql] object

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-28 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61526900 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -246,12 +263,39 @@ private[sql] object

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-28 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/12750#discussion_r61526786 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala --- @@ -76,6 +78,15 @@ private[sql] object

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-28 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215607825 Alright, here's a few ideas that will at least reduce allocations by a bit. Your version with the merge sort is likely better than the insertion sort here but I

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-28 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215592580 Maybe, but my hunch is that it's going to be slower and won't save much code. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-28 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215585269 Would Guava's `Iterables.mergeSorted[T]` help out here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-28 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215569915 /cc @NathanHowell, FYI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215316922 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215316921 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215316797 **[Test build #57215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57215/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215310239 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215310152 **[Test build #57212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57212/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215310236 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215308136 **[Test build #57215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57215/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12750#issuecomment-215303075 **[Test build #57212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57212/consoleFull)** for PR 12750 at commit

[GitHub] spark pull request: [SPARK-14972] Improve performance of JSON sche...

2016-04-27 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/12750 [SPARK-14972] Improve performance of JSON schema inference's compatibleType method This patch improves the performance of `InferSchema.compatibleType` and `inferField`. The net result of this