[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12629#issuecomment-213670329
  
**[Test build #56775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56775/consoleFull)**
 for PR 12629 at commit 
[`e5dec86`](https://github.com/apache/spark/commit/e5dec86845cbf25eb606ceea7a81151c0ed638de).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-22 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12397#issuecomment-213670252
  
Can we re test this as I think there was a minor change since the test build


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-22 Thread robbinspg
Github user robbinspg commented on the pull request:

https://github.com/apache/spark/pull/12501#issuecomment-213670198
  
closing this in favour of other implementation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

2016-04-22 Thread robbinspg
Github user robbinspg closed the pull request at:

https://github.com/apache/spark/pull/12501


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12319


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...

2016-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12630


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213669972
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...

2016-04-22 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12630#issuecomment-213669781
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213669684
  
**[Test build #56782 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56782/consoleFull)**
 for PR 12568 at commit 
[`3690c7c`](https://github.com/apache/spark/commit/3690c7cc210dc9aedd168202dff17902f4c0c4e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14863][SQL] Cache TreeNode's hashCode b...

2016-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12626


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...

2016-04-22 Thread pravingadakh
Github user pravingadakh commented on the pull request:

https://github.com/apache/spark/pull/12416#issuecomment-213668195
  
@dbtsai I'll update the PR soon, I have been overwhelmed by office work :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14863][SQL] Cache TreeNode's hashCode b...

2016-04-22 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12626#issuecomment-213668471
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14863][SQL] Cache TreeNode's hashCode b...

2016-04-22 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12626#issuecomment-213667937
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11102] [SQL] Uninformative exception wh...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9490#issuecomment-213668158
  
ping @zjffdu


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14857] [SQL] Table/Database Name Valida...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12618#issuecomment-213667639
  
**[Test build #56781 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56781/consoleFull)**
 for PR 12618 at commit 
[`bfe536e`](https://github.com/apache/spark/commit/bfe536eaba938be18253aeb71eb79a56f69856ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12619


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213667183
  
Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213666884
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56774/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213666883
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213666846
  
**[Test build #56774 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56774/consoleFull)**
 for PR 12619 at commit 
[`6056a47`](https://github.com/apache/spark/commit/6056a47cf807cbf70f7f26af7dcc07737dc232c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12630#issuecomment-213666752
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56771/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12630#issuecomment-213666751
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12630#issuecomment-213666711
  
**[Test build #56771 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56771/consoleFull)**
 for PR 12630 at commit 
[`06ff604`](https://github.com/apache/spark/commit/06ff604d90fcd1fc6477dbb6533c4652ec9f12a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14867][BUILD] Make `build/mvn` to use t...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12631#issuecomment-21364
  
**[Test build #56780 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56780/consoleFull)**
 for PR 12631 at commit 
[`56355fa`](https://github.com/apache/spark/commit/56355fab23ef59b79bbaafa71643ca742326).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14867][BUILD] Make `build/mvn` to use t...

2016-04-22 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/12631

[SPARK-14867][BUILD] Make `build/mvn` to use the downloaded maven if it 
exists.

## What changes were proposed in this pull request?

Currently, `build/mvn` provides a convenient option, `--force`, in order to 
use the recommended version of maven without changing PATH environment 
variable. However, there were two problems.

- `dev/lint-java` does not use the newly installed maven.

  ```bash
$ ./build/mvn --force clean
$ ./dev/lint-java 
Using `mvn` from path: /usr/local/bin/mvn
```
- It's not easy to type `--force` option always.

If '--force' option is used once, we had better prefer the installed maven 
recommended by Spark.
This PR makes `build/mvn` check the existence of maven installed by 
`--force` option first.

## How was this patch tested?

Manual.

```bash
$ ./build/mvn --force clean
$ ./dev/lint-java 
Using `mvn` from path: 
/Users/dongjoon/spark/build/apache-maven-3.3.9/bin/mvn
...
$ rm -rf ./build/apache-maven-3.3.9/
$ ./dev/lint-java 
Using `mvn` from path: /usr/local/bin/mvn
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-14867

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12631


commit 56355fab23ef59b79bbaafa71643ca742326
Author: Dongjoon Hyun 
Date:   2016-04-14T08:55:46Z

[SPARK-14867][BUILD] Make `build/mvn` to use the downloaded maven if it 
exist.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12590#issuecomment-213666548
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56772/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12590#issuecomment-213666547
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12590#issuecomment-213666496
  
**[Test build #56772 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56772/consoleFull)**
 for PR 12590 at commit 
[`bda4ae6`](https://github.com/apache/spark/commit/bda4ae62c812b256da4bb7f89f07623dd87ea439).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12402#discussion_r60823427
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -104,6 +105,25 @@ class GaussianMixtureModel private[ml] (
   @Since("2.0.0")
   def gaussians: Array[MultivariateGaussian] = parentModel.gaussians
 
+  /**
+   * Helper method used in Python.
+   * Retrieve Gaussian distributions as a DataFrame.
+   * Each row represents a Gaussian Distribution.
+   * Two columns are defined: mean and cov.
+   * Schema:
+   * root
--- End diff --

Surround schema with triple braces to make it appear like code:
```
{{{
root
|-- ...
}}}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12402#discussion_r60823428
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -104,6 +105,25 @@ class GaussianMixtureModel private[ml] (
   @Since("2.0.0")
   def gaussians: Array[MultivariateGaussian] = parentModel.gaussians
 
+  /**
+   * Helper method used in Python.
+   * Retrieve Gaussian distributions as a DataFrame.
+   * Each row represents a Gaussian Distribution.
+   * Two columns are defined: mean and cov.
+   * Schema:
+   * root
+   * |-- mean: vector (nullable = true)
+   * |-- cov: matrix (nullable = true)
+   */
+  def gaussiansDF: DataFrame = {
--- End diff --

Since 2.0.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12402#discussion_r60823426
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -104,6 +105,25 @@ class GaussianMixtureModel private[ml] (
   @Since("2.0.0")
   def gaussians: Array[MultivariateGaussian] = parentModel.gaussians
 
+  /**
+   * Helper method used in Python.
--- End diff --

Remove this 1 line.  (This is an implementation detail and should not be 
exposed in user docs.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6717][ML] Clear shuffle files after che...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11919#issuecomment-213664760
  
**[Test build #56779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56779/consoleFull)**
 for PR 11919 at commit 
[`dd50130`](https://github.com/apache/spark/commit/dd5013002611d3c232b8384eef89f13f9113eef4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213663864
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56767/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213663862
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213663616
  
**[Test build #56767 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56767/consoleFull)**
 for PR 12568 at commit 
[`4acfb8c`](https://github.com/apache/spark/commit/4acfb8c4eb24f3a6fdee67252d495c44fe44b2b9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][ML][MLLIB] Remove unused imports

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12497#issuecomment-213663512
  
**[Test build #56778 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56778/consoleFull)**
 for PR 12497 at commit 
[`ab42268`](https://github.com/apache/spark/commit/ab42268106dfde1b3de156f47f4cebfcca50129e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213663454
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][ML][MLLIB] Remove unused imports

2016-04-22 Thread zhengruifeng
Github user zhengruifeng commented on the pull request:

https://github.com/apache/spark/pull/12497#issuecomment-213663449
  
@srowen I have reviewed all scala files in Graphx and some in SQL. And 
remove another some unused imports in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213663455
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56766/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213663432
  
**[Test build #56766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56766/consoleFull)**
 for PR 12568 at commit 
[`04cc43b`](https://github.com/apache/spark/commit/04cc43b29cbf7e4b71046b848e446143b0b212a1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7861][ML] PySpark OneVsRest

2016-04-22 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12124#issuecomment-213663303
  
I'm working on a simpler fix for now: 
[https://issues.apache.org/jira/browse/SPARK-14862]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][ML][MLLIB] Remove unused imports

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12497#issuecomment-213663339
  
**[Test build #56777 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56777/consoleFull)**
 for PR 12497 at commit 
[`90e57c8`](https://github.com/apache/spark/commit/90e57c8bc98abc36e0c3a26f348da64b358acd3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12239#issuecomment-213663172
  
**[Test build #56776 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56776/consoleFull)**
 for PR 12239 at commit 
[`82add06`](https://github.com/apache/spark/commit/82add06177c6b730459aea5eb7e277a0615147fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213663159
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213663160
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56768/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213663125
  
**[Test build #56768 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56768/consoleFull)**
 for PR 12268 at commit 
[`92f8f38`](https://github.com/apache/spark/commit/92f8f387cec10cb61e178b312748f86bd75b1b55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-22 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12239#issuecomment-213663088
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14838][SQL] Implement statistics in Ser...

2016-04-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12599#discussion_r60823141
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -83,6 +83,28 @@ case class SerializeFromObject(
 child: LogicalPlan) extends UnaryNode with ObjectConsumer {
 
   override def output: Seq[Attribute] = serializer.map(_.toAttribute)
+
+  // We can't estimate the size of ObjectType. We implement statistics 
here to avoid
+  // directly estimate any child plan which produces domain objects as 
output.
+  override def statistics: Statistics = {
+if (child.output.head.dataType.isInstanceOf[ObjectType]) {
+  val underlyingPlan = child.find { p =>
--- End diff --

+1 for the 4k default size


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread JustinPihony
Github user JustinPihony commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213662908
  
@HyukjinKwon I just posted on the JIRA the background of `Properties` and 
how reasonable it is to assume it can be converted to a `String`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213662852
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213662854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56770/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213662817
  
**[Test build #56770 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56770/consoleFull)**
 for PR 12319 at commit 
[`d6bc52d`](https://github.com/apache/spark/commit/d6bc52d8ba2ff1e10f110d92de865aeae71f9d52).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213662708
  
**[Test build #2859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2859/consoleFull)**
 for PR 12319 at commit 
[`d6bc52d`](https://github.com/apache/spark/commit/d6bc52d8ba2ff1e10f110d92de865aeae71f9d52).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12629#issuecomment-213662696
  
**[Test build #56775 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56775/consoleFull)**
 for PR 12629 at commit 
[`e5dec86`](https://github.com/apache/spark/commit/e5dec86845cbf25eb606ceea7a81151c0ed638de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6717][ML] Clear shuffle files after che...

2016-04-22 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11919#discussion_r60822985
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala 
---
@@ -1306,4 +1306,33 @@ object ALS extends DefaultParamsReadable[ALS] with 
Logging {
* satisfies this requirement, we simply use a type alias here.
*/
   private[recommendation] type ALSPartitioner = 
org.apache.spark.HashPartitioner
+
+  /**
+   * Private function to checkpoint the RDD and clean up its all of its 
parents' shuffles eagerly.
+   */
+  private[spark] def checkpointAndCleanParents[T](rdd: RDD[T], blocking: 
Boolean = false): Unit = {
+val sc = rdd.sparkContext
+// If there is no reference tracking we skip clean up.
+if (sc.cleaner.isEmpty) {
+  return rdd.checkpoint()
--- End diff --

Ah thats a good catch (this used to not be an issue since I left the 
materilization in the initial PR for both). Anyways I'll refactor this to break 
up the cleanup and explicitly capture the deps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8398][CORE] Hadoop input/output format ...

2016-04-22 Thread koertkuipers
Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/6848#issuecomment-213661475
  
@holdenk ok i tried to make it look all pretty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213661444
  
**[Test build #56774 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56774/consoleFull)**
 for PR 12619 at commit 
[`6056a47`](https://github.com/apache/spark/commit/6056a47cf807cbf70f7f26af7dcc07737dc232c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14654][CORE][WIP] New accumulator API

2016-04-22 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12612#discussion_r60822740
  
--- Diff: core/src/main/scala/org/apache/spark/NewAccumulator.scala ---
@@ -0,0 +1,299 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.{lang => jl}
+import java.io.{ObjectInputStream, ObjectOutputStream}
+import java.util.concurrent.atomic.AtomicLong
+import javax.annotation.concurrent.GuardedBy
+
+import org.apache.spark.scheduler.AccumulableInfo
+import org.apache.spark.util.Utils
+
+
+private[spark] case class AccumulatorMetadata(
+id: Long,
+name: Option[String],
+countFailedValues: Boolean) extends Serializable
+
+
+abstract class NewAccumulator[IN, OUT] extends Serializable {
+  private[spark] var metadata: AccumulatorMetadata = _
+
+  private[spark] def register(
+  sc: SparkContext,
+  id: Long = AccumulatorContext.newId(),
+  name: Option[String] = None,
+  countFailedValues: Boolean = false): Unit = {
+if (this.metadata != null) {
+  throw new IllegalStateException("Cannot register an Accumulator 
twice.")
+}
+this.metadata = AccumulatorMetadata(id, name, countFailedValues)
+AccumulatorContext.register(this)
+sc.cleaner.foreach(_.registerAccumulatorForCleanup(this))
+  }
+
+  private[spark] def assertRegistered(): Unit = {
+if (metadata == null) {
+  throw new IllegalStateException("Accumulator is not registered yet")
+}
+  }
+
+  def id: Long = {
+assertRegistered()
+metadata.id
+  }
+
+  def initialize(): Unit = {}
+
+  def add(v: IN): Unit
+
+  def +=(v: IN): Unit = add(v)
+
+  def merge(other: NewAccumulator[IN, OUT]): Unit
+
+  def ++=(other: NewAccumulator[IN, OUT]): Unit = merge(other)
+
+  def value: OUT
+
+  private[spark] def toInfo(update: Option[Any], value: Option[Any]): 
AccumulableInfo = {
+assertRegistered()
+val isInternal = 
metadata.name.exists(_.startsWith(InternalAccumulator.METRICS_PREFIX))
+new AccumulableInfo(
+  metadata.id, metadata.name, update, value, isInternal, 
metadata.countFailedValues)
+  }
+
+  // Called by Java when serializing an object
+  private def writeObject(out: ObjectOutputStream): Unit = 
Utils.tryOrIOException {
+assertRegistered()
+out.defaultWriteObject()
+  }
+
+  // Called by Java when deserializing an object
+  private def readObject(in: ObjectInputStream): Unit = 
Utils.tryOrIOException {
+in.defaultReadObject()
+initialize()
+
+// Automatically register the accumulator when it is deserialized with 
the task closure.
+// This is for external accumulators and internal ones that do not 
represent task level
+// metrics, e.g. internal SQL metrics, which are per-operator.
+val taskContext = TaskContext.get()
+if (taskContext != null) {
+  taskContext.registerAccumulator(this)
+}
+  }
+}
+
+object AccumulatorContext {
+
+  /**
+   * This global map holds the original accumulator objects that are 
created on the driver.
+   * It keeps weak references to these objects so that accumulators can be 
garbage-collected
+   * once the RDDs and user-code that reference them are cleaned up.
+   * TODO: Don't use a global map; these should be tied to a SparkContext 
(SPARK-13051).
+   */
+  @GuardedBy("AccumulatorContext")
+  private val originals = new java.util.HashMap[Long, 
jl.ref.WeakReference[NewAccumulator[_, _]]]
+
+  private[this] val nextId = new AtomicLong(0L)
+
+  /**
+   * Return a globally unique ID for a new [[NewAccumulator]].
+   * Note: Once you copy the [[NewAccumulator]] the ID is no longer unique.
+   */
+  def newId(): Long = nextId.getAndIncrement
+
+  /**
+   * Register an [[NewAccumulator]] 

[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12629#issuecomment-213661096
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12629#issuecomment-213661097
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56765/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12629#issuecomment-213661066
  
**[Test build #56765 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56765/consoleFull)**
 for PR 12629 at commit 
[`ccd3c7b`](https://github.com/apache/spark/commit/ccd3c7b43c0247e345b714210f7421d7dc484718).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213660941
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56773/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213660938
  
**[Test build #56773 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56773/consoleFull)**
 for PR 12619 at commit 
[`871c009`](https://github.com/apache/spark/commit/871c00960dce2b1bf598e781dbd1ba3f18dddf3f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213660940
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/12619#discussion_r60822682
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala
 ---
@@ -308,8 +304,11 @@ private[sql] class DefaultSource
 // TODO: if you move this into the closure it reverts to the default 
values.
 // If true, enable using the custom RecordReader for parquet. This 
only works for
 // a subset of the types (no complex types).
-val enableVectorizedParquetReader: Boolean = 
sqlContext.conf.parquetVectorizedReaderEnabled &&
-  dataSchema.forall(_.dataType.isInstanceOf[AtomicType])
+val resultSchema = StructType(partitionSchema.fields ++ 
requiredSchema.fields)
+val enableVectorizedReader: Boolean = 
sqlContext.conf.parquetVectorizedReaderEnabled &&
--- End diff --

Nothing too important. The comment `// If true, enable using the custom 
RecordReader for parquet...` could be above this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213660896
  
**[Test build #56773 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56773/consoleFull)**
 for PR 12619 at commit 
[`871c009`](https://github.com/apache/spark/commit/871c00960dce2b1bf598e781dbd1ba3f18dddf3f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12590#issuecomment-213660696
  
**[Test build #56772 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56772/consoleFull)**
 for PR 12590 at commit 
[`bda4ae6`](https://github.com/apache/spark/commit/bda4ae62c812b256da4bb7f89f07623dd87ea439).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12630#issuecomment-213660698
  
**[Test build #56771 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56771/consoleFull)**
 for PR 12630 at commit 
[`06ff604`](https://github.com/apache/spark/commit/06ff604d90fcd1fc6477dbb6533c4652ec9f12a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...

2016-04-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12590#issuecomment-213660622
  
Rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14866][SQL] Break SQLQuerySuite out int...

2016-04-22 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/12630

[SPARK-14866][SQL] Break SQLQuerySuite out into smaller test suites

## What changes were proposed in this pull request?
This patch breaks SQLQuerySuite out into smaller test suites. It was a 
little bit too large for debugging.

## How was this patch tested?
This is a test only change.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-14866

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12630.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12630


commit 06ff604d90fcd1fc6477dbb6533c4652ec9f12a8
Author: Reynold Xin 
Date:   2016-04-23T03:42:16Z

[SPARK-14866][SQL] Break SQLQuerySuite out into smaller test suites




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12615#issuecomment-213660185
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12615#issuecomment-213660186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56764/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12615#issuecomment-213660152
  
**[Test build #56764 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56764/consoleFull)**
 for PR 12615 at commit 
[`957c3c1`](https://github.com/apache/spark/commit/957c3c130aeeb31445027168add0f6a99acd3fe8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...

2016-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12615


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213660054
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56769/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213660053
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213660046
  
**[Test build #56769 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56769/consoleFull)**
 for PR 12619 at commit 
[`04a900b`](https://github.com/apache/spark/commit/04a900b10a04d3930f0dbb7ad3d570552de49075).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9314] [EC2] add root EBS config options...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/7647#issuecomment-213659877
  
ping @kmaehashi 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14842][SQL] Implement view creation in ...

2016-04-22 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12615#issuecomment-213659905
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213659831
  
**[Test build #2859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2859/consoleFull)**
 for PR 12319 at commit 
[`d6bc52d`](https://github.com/apache/spark/commit/d6bc52d8ba2ff1e10f110d92de865aeae71f9d52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213659743
  
**[Test build #56770 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56770/consoleFull)**
 for PR 12319 at commit 
[`d6bc52d`](https://github.com/apache/spark/commit/d6bc52d8ba2ff1e10f110d92de865aeae71f9d52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603] [sparkR] In windows, Incorrect fi...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/7025#issuecomment-213659754
  
ping @prakashpc 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603] [sparkR] In windows, Incorrect fi...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/7025#issuecomment-213659474
  
@JoshRosen I can submit a PR based on this if you think this PR is 
abandoned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213659267
  
LGTM pending Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213658964
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14551][SQL] Reduce number of NameNode c...

2016-04-22 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/12319#issuecomment-213658842
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12619#issuecomment-213658753
  
**[Test build #56769 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56769/consoleFull)**
 for PR 12619 at commit 
[`04a900b`](https://github.com/apache/spark/commit/04a900b10a04d3930f0dbb7ad3d570552de49075).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213658568
  
**[Test build #56768 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56768/consoleFull)**
 for PR 12268 at commit 
[`92f8f38`](https://github.com/apache/spark/commit/92f8f387cec10cb61e178b312748f86bd75b1b55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14856] [SQL] returning batch correctly

2016-04-22 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12619#discussion_r60822250
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala
 ---
@@ -308,8 +304,11 @@ private[sql] class DefaultSource
 // TODO: if you move this into the closure it reverts to the default 
values.
 // If true, enable using the custom RecordReader for parquet. This 
only works for
 // a subset of the types (no complex types).
-val enableVectorizedParquetReader: Boolean = 
sqlContext.conf.parquetVectorizedReaderEnabled &&
-  dataSchema.forall(_.dataType.isInstanceOf[AtomicType])
+val resultSchema = StructType(partitionSchema.fields ++ 
requiredSchema.fields)
+val enableVectorizedReader: Boolean = 
sqlContext.conf.parquetVectorizedReaderEnabled &&
--- End diff --

move to where?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14828][SQL] Start SparkSession in REPL ...

2016-04-22 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12589#discussion_r60822234
  
--- Diff: 
repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -1026,21 +1025,7 @@ class SparkILoop(
   }
 
   @DeveloperApi
-  def createSQLContext(): SQLContext = {
-val name = "org.apache.spark.sql.hive.HiveContext"
-val loader = Utils.getContextOrSparkClassLoader
-try {
-  sqlContext = 
loader.loadClass(name).getConstructor(classOf[SparkContext])
-.newInstance(sparkContext).asInstanceOf[SQLContext]
-  logInfo("Created sql context (with Hive support)..")
-}
-catch {
-  case _: java.lang.ClassNotFoundException | _: 
java.lang.NoClassDefFoundError =>
-sqlContext = new SQLContext(sparkContext)
-logInfo("Created sql context..")
-}
-sqlContext
-  }
+  def createSparkSession(): SparkSession = Main.createSparkSession()
--- End diff --

Not very sure. How about we still duplicate the code for now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213658002
  
@rxin Could you please review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213657538
  
**[Test build #56767 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56767/consoleFull)**
 for PR 12568 at commit 
[`4acfb8c`](https://github.com/apache/spark/commit/4acfb8c4eb24f3a6fdee67252d495c44fe44b2b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213656755
  
**[Test build #56766 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56766/consoleFull)**
 for PR 12568 at commit 
[`04cc43b`](https://github.com/apache/spark/commit/04cc43b29cbf7e4b71046b848e446143b0b212a1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12629#issuecomment-213656756
  
**[Test build #56765 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56765/consoleFull)**
 for PR 12629 at commit 
[`ccd3c7b`](https://github.com/apache/spark/commit/ccd3c7b43c0247e345b714210f7421d7dc484718).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12629#issuecomment-213656738
  
cc @davies @viirya 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-04-22 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/12629

[SPARK-14800][SQL] Dealing with null as a value in options for each 
internal data source

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-14800

This PR add the support for `null` for values as options (as a default 
value) for all the internal data source in Spark.

This PR introduces two classes

- `PrameterUtils`: This has some functions used in `CSVOptions` to check 
`null` for other data sources.
- `OrcOptions`: Just like `ParquetOptions` this was separated (actually 
they are almost identical).

## How was this patch tested?

Unit tests in `CSVSuite`, `JsonSuite`, `OrcHadoopFsRelationSuite, 
`ParquetHadoopFsRelationSuite` and `LibSVMRelation`. Also,`sbt scalastyle`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-14800

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12629


commit 8fb3a23ef61353749c35f523dbfc7d8f5d739fbf
Author: hyukjinkwon 
Date:   2016-04-23T02:09:36Z

CSV and JSON are now safe with null options

commit ccd3c7b43c0247e345b714210f7421d7dc484718
Author: hyukjinkwon 
Date:   2016-04-23T02:55:02Z

text, ORC, Parquet and libsvm are also okay




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12568#discussion_r60822093
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -182,7 +182,7 @@ private[spark] class BlockManager(
 val shuffleConfig = new ExecutorShuffleInfo(
   diskBlockManager.localDirs.map(_.toString),
   diskBlockManager.subDirsPerLocalDir,
-  shuffleManager.shortName)
--- End diff --

Yes, I agree with @markgrover .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread lianhuiwang
Github user lianhuiwang commented on the pull request:

https://github.com/apache/spark/pull/12568#issuecomment-213656641
  
@vanzin I have addressed your comments. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14731][shuffle]Revert SPARK-12130 to ma...

2016-04-22 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12568#discussion_r60821688
  
--- Diff: 
common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleIntegrationSuite.java
 ---
@@ -184,12 +184,9 @@ public void testFetchThreeSort() throws Exception {
 exec0Fetch.releaseBuffers();
   }
 
-  @Test
-  public void testFetchInvalidShuffle() throws Exception {
+  @Test (expected = RuntimeException.class)
--- End diff --

It will throw a generic RunTimeException.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14863][SQL] Cache TreeNode's hashCode b...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12626#issuecomment-213653604
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56761/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >