date:20151210

[GitHub] spark pull request: [SPARK-11815] [ML] [PySpark] PySpark DecisionT...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9807#issuecomment-163688796
  
**[Test build #47511 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47511/consoleFull)**
 for PR 9807 at commit 
[`9dd8870`](https://github.com/apache/spark/commit/9dd88706a1401598f6a818958ee9f10ea73dea57).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11815] [ML] [PySpark] PySpark DecisionT...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9807#issuecomment-163688975
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47511/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread dereksabryfb

Github user dereksabryfb commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163699828
  
Added a case for sort


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12198] [SparkR] SparkR support read.par...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10191#issuecomment-163699843
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10240#issuecomment-163704082
  
**[Test build #47528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47528/consoleFull)**
 for PR 10240 at commit 
[`d8be669`](https://github.com/apache/spark/commit/d8be66911d2abf3da46a25a54a7d80fd1eeebdfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Doc][Minor] Update the description...

2015-12-10 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10246#issuecomment-163706432
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12235][SPARKR] Enhance mutate() to supp...

2015-12-10 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/10220#issuecomment-163706535
  
@felixcheung Could you see if this satisfies the requirements in 
https://issues.apache.org/jira/browse/SPARK-10346 ? The only other thing we had 
in mind was to match the signature of `mutate` in dplyr ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7286] [SQL] Deprecate !== in favour of ...

2015-12-10 Thread jodersky

Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/9925#issuecomment-163708919
  
I agree that its not pretty, however the only other fix I see is to remove 
"$" for columns instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOCS][ML][SPARK-11964] Add in Pipeline Import...

2015-12-10 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10179#issuecomment-163712455
  
@anabranch Hm, I may not have been clear enough.  The save/load 
functionality seems general and important enough that it should go under the 
"Main concepts in Pipelines" section; I would put a subsection with a small 
paragraph (without code) at the end of the "Main concepts in Pipelines" 
section, just before the "Code example" section.  I would then modify the first 
code example "Example: Estimator, Transformer, and Param" to include saving and 
loading the pipeline.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10234#discussion_r47264940
  
--- Diff: docs/ml-classification-regression.md ---
@@ -27,10 +27,10 @@ displayTitle: Classification and regression in spark.ml
 * This will become a table of contents (this text will be scraped).
 {:toc}
 
-In MLlib, we implement popular linear methods such as logistic
+In `spark.ml`, we implement popular linear methods such as logistic
--- End diff --

I see the purpose now. It was the old MLlib text, but a lot of it still 
applies. The distinction is removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163714372
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47530/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10204#issuecomment-163714745
  
**[Test build #47527 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47527/consoleFull)**
 for PR 10204 at commit 
[`c5294a9`](https://github.com/apache/spark/commit/c5294a91a52124fa45cb32bd5799d6f1c0374fd0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10240#discussion_r47267569
  
--- Diff: 
core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala ---
@@ -91,23 +108,34 @@ private[memory] class ExecutionMemoryPool(
   val numActiveTasks = memoryForTask.keys.size
   val curMem = memoryForTask(taskAttemptId)
 
-  // How much we can grant this task; don't let it grow to more than 1 
/ numActiveTasks;
-  // don't let it be negative
-  val maxToGrant =
-math.min(numBytes, math.max(0, (poolSize / numActiveTasks) - 
curMem))
+  // In every iteration of this loop, we should first try to reclaim 
any borrowed execution
+  // space from storage. This is necessary because of the potential 
race condition where new
+  // storage blocks may steal the free execution memory that this task 
was waiting for.
+  maybeGrowPool(numBytes - memoryFree)
+
+  // Maximum size the pool would have after potentially growing the 
pool.
+  // This is used to compute the upper bound of how much memory each 
task can occupy. This
+  // must take into account potential free memory as well as the 
amount this pool currently
+  // occupies. Otherwise, we may run into SPARK-12155 where, in 
unified memory management,
+  // we did not take into account space that could have been freed by 
evicting cached blocks.
+  val maxPoolSize = computeMaxPoolSize()
+  val maxMemoryPerTask = maxPoolSize / numActiveTasks
+  val minMemoryPerTask = poolSize / (2 * numActiveTasks)
+
+  // How much we can grant this task; keep its share within 0 <= X <= 
1 / numActiveTasks
+  val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - 
curMem))
   // Only give it as much memory as is free, which might be none if it 
reached 1 / numTasks
   val toGrant = math.min(maxToGrant, memoryFree)
 
-  if (curMem < poolSize / (2 * numActiveTasks)) {
+  if (curMem < minMemoryPerTask) {
--- End diff --

The current code is hard to understand, I can prove that it's the same with 
mine one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10236#issuecomment-163716622
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10647][MESOS] Fix zookeeper dir with me...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10057#issuecomment-163689616
  
**[Test build #47497 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47497/consoleFull)**
 for PR 10057 at commit 
[`b8fc74c`](https://github.com/apache/spark/commit/b8fc74c4f2d0e648b439ba722230e8e865ccca76).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11563] [core] [repl] Use RpcEnv to tran...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9923#issuecomment-163690659
  
**[Test build #47525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47525/consoleFull)**
 for PR 9923 at commit 
[`08a74e5`](https://github.com/apache/spark/commit/08a74e5606a4df2317040ff270e5a9bfd7f6efd2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10204#issuecomment-163690621
  
**[Test build #47527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47527/consoleFull)**
 for PR 10204 at commit 
[`c5294a9`](https://github.com/apache/spark/commit/c5294a91a52124fa45cb32bd5799d6f1c0374fd0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12198] [SparkR] SparkR support read.par...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10191#issuecomment-163699540
  
**[Test build #47518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47518/consoleFull)**
 for PR 10191 at commit 
[`9e0fd63`](https://github.com/apache/spark/commit/9e0fd637c97ea269398db7469499aa4d7e3dda45).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9003] [MLlib] Add mapActive{Pairs,Value...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7357#issuecomment-163701403
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47502/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10240#issuecomment-163701442
  
ok, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10240#issuecomment-163701557
  
Last commit actually passed tests last night.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12220][Core]Make Utils.fetchFile suppor...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10208#issuecomment-163705815
  
**[Test build #47529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47529/consoleFull)**
 for PR 10208 at commit 
[`2c31643`](https://github.com/apache/spark/commit/2c3164386040b5051e0332652cff9d2052b90cdb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12012][SQL] Backports PR #10004 to bran...

2015-12-10 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10250#issuecomment-163708398
  
I have merged it. Let's close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163710755
  
**[Test build #47531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47531/consoleFull)**
 for PR 10209 at commit 
[`fb562fb`](https://github.com/apache/spark/commit/fb562fb67a761276456b14a81513f3fc69a6ead8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47267922
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(le
   override def output: Seq[Attribute] = left.output
 }
 
+object Join {
+  def apply(
+left: LogicalPlan,
+right: LogicalPlan,
+joinType: JoinType,
+condition: Option[Expression]): Join = {
+Join(left, right, joinType, condition, None)
+  }
+}
+
 case class Join(
   left: LogicalPlan,
   right: LogicalPlan,
   joinType: JoinType,
-  condition: Option[Expression]) extends BinaryNode {
+  condition: Option[Expression],
+  generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode {
--- End diff --

This is semi-public API cause I think some advanced projects do dig into 
catalyst and we've never changed the signature of something as basic as `Join` 
before.  Could we do this instead by fixing nullablity propagation and only 
inserting the filter if the attribute is `nullable`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12227][SQL] Support drop multiple colum...

2015-12-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10218#issuecomment-163718935
  
I'm not sure this is worth the complexity.  I think most users will only 
ever drop by name (since dropping a complex expression doesn't really make 
sense), and in that case constructing a column is strictly more typing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163719076
  
In addition to moving ml-intro back to ml-guide, it'd be nice if the 
sidebar had links back to the main spark.ml and spark.mllib pages.  That could 
be done in a separate JIRA/PR, if you prefer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47268378
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -99,6 +99,13 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
*/
   lazy val resolved: Boolean = expressions.forall(_.resolved) && 
childrenResolved
 
+  /**
+   * Returns true if the two plans are semantically equal. This should 
ignore state generated
+   * during planning to help the planning process.
+   * TODO: implement this as a pass that canonicalizes the plan tree 
instead?
+   */
+  def semanticEquals(other: LogicalPlan): Boolean = this == other
--- End diff --

Oh, this is a new semantic equals.  How is this different than 
`sameResult`?  Maybe we should unify the naming between Expression and 
LogicalPlan for this concept.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163722755
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47534/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-163726187
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163728332
  
I'd try the following locally `build/sbt scalastyle test:scalastyle 
catalyst/test sql/test`.

Each of those commands can be run separately too and you can use ~ to rerun 
whenever something changes to iterate more quickly `build/sbt ~scalastyle`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163729292
  
**[Test build #47538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47538/consoleFull)**
 for PR 10052 at commit 
[`bd453d5`](https://github.com/apache/spark/commit/bd453d5f6744aa8fdd03b5ee2ecd44b471165eb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10240#discussion_r47275498
  
--- Diff: 
core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala ---
@@ -91,23 +108,34 @@ private[memory] class ExecutionMemoryPool(
   val numActiveTasks = memoryForTask.keys.size
   val curMem = memoryForTask(taskAttemptId)
 
-  // How much we can grant this task; don't let it grow to more than 1 
/ numActiveTasks;
-  // don't let it be negative
-  val maxToGrant =
-math.min(numBytes, math.max(0, (poolSize / numActiveTasks) - 
curMem))
+  // In every iteration of this loop, we should first try to reclaim 
any borrowed execution
+  // space from storage. This is necessary because of the potential 
race condition where new
+  // storage blocks may steal the free execution memory that this task 
was waiting for.
+  maybeGrowPool(numBytes - memoryFree)
+
+  // Maximum size the pool would have after potentially growing the 
pool.
+  // This is used to compute the upper bound of how much memory each 
task can occupy. This
+  // must take into account potential free memory as well as the 
amount this pool currently
+  // occupies. Otherwise, we may run into SPARK-12155 where, in 
unified memory management,
+  // we did not take into account space that could have been freed by 
evicting cached blocks.
+  val maxPoolSize = computeMaxPoolSize()
+  val maxMemoryPerTask = maxPoolSize / numActiveTasks
+  val minMemoryPerTask = poolSize / (2 * numActiveTasks)
+
+  // How much we can grant this task; keep its share within 0 <= X <= 
1 / numActiveTasks
+  val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - 
curMem))
   // Only give it as much memory as is free, which might be none if it 
reached 1 / numTasks
   val toGrant = math.min(maxToGrant, memoryFree)
 
-  if (curMem < poolSize / (2 * numActiveTasks)) {
+  if (curMem < minMemoryPerTask) {
--- End diff --

I was able to prove this myself. I summarized my thoughts in this gist: 
https://gist.github.com/andrewor14/aea58796dd25d2ec9f20

That said, I would still prefer to do this separately since this PR is 
already passing tests. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163731886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47537/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10228#discussion_r47276008
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala
 ---
@@ -165,155 +134,52 @@ abstract class AggregationIterator(
 
   // Initializing functions used to process a row.
   protected val processRow: (MutableRow, InternalRow) => Unit = {
-val rowToBeProcessed = new JoinedRow
-val aggregationBufferSchema = 
allAggregateFunctions.flatMap(_.aggBufferAttributes)
-aggregationMode match {
-  // Partial-only
-  case (Some(Partial), None) =>
-val updateExpressions = nonCompleteAggregateFunctions.flatMap {
-  case ae: DeclarativeAggregate => ae.updateExpressions
-  case agg: AggregateFunction => 
Seq.fill(agg.aggBufferAttributes.length)(NoOp)
-}
-val expressionAggUpdateProjection =
-  newMutableProjection(updateExpressions, aggregationBufferSchema 
++ valueAttributes)()
-
-(currentBuffer: MutableRow, row: InternalRow) => {
-  expressionAggUpdateProjection.target(currentBuffer)
-  // Process all expression-based aggregate functions.
-  expressionAggUpdateProjection(rowToBeProcessed(currentBuffer, 
row))
-  // Process all imperative aggregate functions.
-  var i = 0
-  while (i < nonCompleteImperativeAggregateFunctions.length) {
-
nonCompleteImperativeAggregateFunctions(i).update(currentBuffer, row)
-i += 1
-  }
-}
-
-  // PartialMerge-only or Final-only
-  case (Some(PartialMerge), None) | (Some(Final), None) =>
-val inputAggregationBufferSchema = if (initialInputBufferOffset == 
0) {
-  // If initialInputBufferOffset, the input value does not contain
-  // grouping keys.
-  // This part is pretty hacky.
-  allAggregateFunctions.flatMap(_.inputAggBufferAttributes).toSeq
-} else {
-  groupingKeyAttributes ++ 
allAggregateFunctions.flatMap(_.inputAggBufferAttributes)
-}
-// val inputAggregationBufferSchema =
-//  groupingKeyAttributes ++
-//allAggregateFunctions.flatMap(_.cloneBufferAttributes)
-val mergeExpressions = nonCompleteAggregateFunctions.flatMap {
-  case ae: DeclarativeAggregate => ae.mergeExpressions
-  case agg: AggregateFunction => 
Seq.fill(agg.aggBufferAttributes.length)(NoOp)
-}
-// This projection is used to merge buffer values for all 
expression-based aggregates.
-val expressionAggMergeProjection =
-  newMutableProjection(
-mergeExpressions,
-aggregationBufferSchema ++ inputAggregationBufferSchema)()
-
-(currentBuffer: MutableRow, row: InternalRow) => {
-  // Process all expression-based aggregate functions.
-  
expressionAggMergeProjection.target(currentBuffer)(rowToBeProcessed(currentBuffer,
 row))
-  // Process all imperative aggregate functions.
-  var i = 0
-  while (i < nonCompleteImperativeAggregateFunctions.length) {
-
nonCompleteImperativeAggregateFunctions(i).merge(currentBuffer, row)
-i += 1
-  }
-}
-
-  // Final-Complete
-  case (Some(Final), Some(Complete)) =>
-val completeAggregateFunctions: Array[AggregateFunction] =
-  
allAggregateFunctions.takeRight(completeAggregateExpressions.length)
-// All imperative aggregate functions with mode Complete.
-val completeImperativeAggregateFunctions: 
Array[ImperativeAggregate] =
-  completeAggregateFunctions.collect { case func: 
ImperativeAggregate => func }
-
-// The first initialInputBufferOffset values of the input 
aggregation buffer is
-// for grouping expressions and distinct columns.
-val groupingAttributesAndDistinctColumns = 
valueAttributes.take(initialInputBufferOffset)
-
-val completeOffsetExpressions =
-  
Seq.fill(completeAggregateFunctions.map(_.aggBufferAttributes.length).sum)(NoOp)
-// We do not touch buffer values of aggregate functions with the 
Final mode.
-val finalOffsetExpressions =
-  
Seq.fill(nonCompleteAggregateFunctions.map(_.aggBufferAttributes.length).sum)(NoOp)
-
-val mergeInputSchema =
-  aggregationBufferSchema ++
-groupingAttributesAndDistinctColumns ++
-
nonCompleteAggregateFunctions.flatMap(_.inputAggBufferAttributes)
-val mergeExpressions =
-  nonCompleteAggregateFunctions.flatMap {
-

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163734167
  
**[Test build #47538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47538/consoleFull)**
 for PR 10052 at commit 
[`bd453d5`](https://github.com/apache/spark/commit/bd453d5f6744aa8fdd03b5ee2ecd44b471165eb4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163734214
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12220][Core]Make Utils.fetchFile suppor...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10208#issuecomment-163733865
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12220][Core]Make Utils.fetchFile suppor...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10208#issuecomment-163733716
  
**[Test build #47529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47529/consoleFull)**
 for PR 10208 at commit 
[`2c31643`](https://github.com/apache/spark/commit/2c3164386040b5051e0332652cff9d2052b90cdb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163737620
  
That's the only remaining issue I found.  I checked against the Spark 1.5 
doc links as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11563] [core] [repl] Use RpcEnv to tran...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9923#issuecomment-163716405
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163718318
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12256] [SQL] Code refactoring: naming b...

2015-12-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10243#issuecomment-163720242
  
+1 to @rxin concerns on wrapping.  A good rule of thumb is to always break 
at the highest syntatic level (not in the middle of some construct like a list 
of arguments).  Otherwise you break things up that are actually the same and 
create an artificial separation.

```scala
// No
def getPath: Expression = path.getOrElse(BoundReference(0, 
inferDataType(typeToken)._1,
  nullable = true))

// Yes
def getPath: Expression = 
  path.getOrElse(BoundReference(0, inferDataType(typeToken)._1, nullable = 
true))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12248][CORE] Adds limits per cpu for me...

2015-12-10 Thread drcrallen

Github user drcrallen commented on the pull request:

https://github.com/apache/spark/pull/10232#issuecomment-163724877
  
Doesn't affect heap memory properly, closing until fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163726273
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-163726125
  
**[Test build #47533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47533/consoleFull)**
 for PR 10228 at commit 
[`3f60962`](https://github.com/apache/spark/commit/3f60962c2fd2f8f140714d0010dd0bb424b034b0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12257][SQL] Non partitioned insert into...

2015-12-10 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/10254#discussion_r47274373
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -155,6 +155,11 @@ case class InsertIntoHiveTable(
 val partitionColumns = 
fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns")
 val partitionColumnNames = 
Option(partitionColumns).map(_.split("/")).orNull
 
+// Validate that partition values are specified for partition columns.
+if (partitionColumnNames != null && partitionColumnNames.size > 0 && 
partitionSpec.size == 0) {
+  throw new SparkException(ErrorMsg.NEED_PARTITION_ERROR.getMsg)
--- End diff --

@marmbrus Thanks. Actually right after the code block i changed, there are 
a few places where we raise SparkException. So i thought there may be a reason 
for it and followed it. :-). I will change all those places as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2015-12-10 Thread jodersky

Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/10231#discussion_r47274795
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging {
 1.0
   }
   logDebug("fraction of data used for calculating quantiles = " + 
fraction)
-  input.sample(withReplacement = false, fraction, new 
XORShiftRandom(seed).nextInt()).collect()
+  input.sample(withReplacement = false, fraction, new 
XORShiftRandom(seed).nextInt())
 } else {
-  new Array[LabeledPoint](0)
+  input.sparkContext.emptyRDD[LabeledPoint]
 }
 
-val splits = new Array[Array[Split]](numFeatures)
-
-// Find all splits.
-// Iterate over all features.
-var featureIndex = 0
-while (featureIndex < numFeatures) {
-  if (metadata.isContinuous(featureIndex)) {
-val featureSamples = sampledInput.map(_.features(featureIndex))
-val featureSplits = findSplitsForContinuousFeature(featureSamples, 
metadata, featureIndex)
+findSplitsBinsBySorting(sampledInput, metadata, continuousFeatures)
+  }
 
-val numSplits = featureSplits.length
-logDebug(s"featureIndex = $featureIndex, numSplits = $numSplits")
-splits(featureIndex) = new Array[Split](numSplits)
+  private def findSplitsBinsBySorting(
+  input: RDD[LabeledPoint],
+  metadata: DecisionTreeMetadata,
+  continuousFeatures: IndexedSeq[Int]): Array[Array[Split]] = {
+
+val continuousSplits = {
+  // reduce the parallelism for split computations when there are less
+  // continuous features than input partitions. this prevents tasks 
from
+  // being spun up that will definitely do no work.
+  val numPartitions = math.min(continuousFeatures.length, 
input.partitions.length)
+
+  input
+.flatMap(point => continuousFeatures.map(idx => (idx, 
point.features(idx
+.groupByKey(numPartitions)
+.map { case (idx, samples) =>
+  val thresholds = findSplitsForContinuousFeature(samples.toArray, 
metadata, idx)
+  val splits: Array[Split] = thresholds.map(thresh => new 
ContinuousSplit(idx, thresh))
+  logDebug(s"featureIndex = $idx, numSplits = ${splits.length}")
+  (idx, splits)
+}.collectAsMap()
+}
 
-var splitIndex = 0
-while (splitIndex < numSplits) {
-  val threshold = featureSplits(splitIndex)
-  splits(featureIndex)(splitIndex) = new 
ContinuousSplit(featureIndex, threshold)
-  splitIndex += 1
-}
-  } else {
-// Categorical feature
-if (metadata.isUnordered(featureIndex)) {
-  val numSplits = metadata.numSplits(featureIndex)
-  val featureArity = metadata.featureArity(featureIndex)
-  // TODO: Use an implicit representation mapping each category to 
a subset of indices.
-  //   I.e., track indices such that we can calculate the set 
of bins for which
-  //   feature value x splits to the left.
-  // Unordered features
-  // 2^(maxFeatureValue - 1) - 1 combinations
-  splits(featureIndex) = new Array[Split](numSplits)
-  var splitIndex = 0
-  while (splitIndex < numSplits) {
-val categories: List[Double] =
-  extractMultiClassCategories(splitIndex + 1, featureArity)
-splits(featureIndex)(splitIndex) =
-  new CategoricalSplit(featureIndex, categories.toArray, 
featureArity)
-splitIndex += 1
-  }
-} else {
-  // Ordered features
-  //   Bins correspond to feature values, so we do not need to 
compute splits or bins
-  //   beforehand.  Splits are constructed as needed during 
training.
-  splits(featureIndex) = new Array[Split](0)
+val numFeatures = metadata.numFeatures
+val splits = Range(0, numFeatures).map {
+  case i if metadata.isContinuous(i) =>
+val split = continuousSplits(i)
+metadata.setNumSplits(i, split.length)
+split
+
+  case i if metadata.isCategorical(i) && metadata.isUnordered(i) =>
+// Unordered features
+// 2^(maxFeatureValue - 1) - 1 combinations
+val featureArity = metadata.featureArity(i)
+val split: IndexedSeq[Split] = Range(0, metadata.numSplits(i)).map 
{ splitIndex =>
--- End diff --

You could use an Array.tablulate here. Something like
```scala
Array.tabulate[Split](numSplits(i)){splitIndex =>
...
}
```


---
If your

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163731702
  
**[Test build #47537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47537/consoleFull)**
 for PR 10234 at commit 
[`8432ac9`](https://github.com/apache/spark/commit/8432ac947a2ee469dbc4082a4fa702da82f44ebe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `* 
more functionality for random forests: estimates of feature importance, as well 
as the predicted probability of each class (a.k.a. class conditional 
probabilities) for classification.`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163731884
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10240#issuecomment-163731260
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10228#discussion_r47276459
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala
 ---
@@ -49,41 +47,20 @@ abstract class AggregationIterator(
   // Initializing functions.
   
///
 
-  // An Seq of all AggregateExpressions.
-  // It is important that all AggregateExpressions with the mode Partial, 
PartialMerge or Final
-  // are at the beginning of the allAggregateExpressions.
-  protected val allAggregateExpressions =
-nonCompleteAggregateExpressions ++ completeAggregateExpressions
-
   require(
-allAggregateExpressions.map(_.mode).distinct.length <= 2,
-s"$allAggregateExpressions are not supported becuase they have more 
than 2 distinct modes.")
-
-  /**
-   * The distinct modes of AggregateExpressions. Right now, we can handle 
the following mode:
--- End diff --

Can you add a similar comment for the new version? Which combinations are 
valid now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163734217
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47538/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...

2015-12-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10236


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/10234#discussion_r47278589
  
--- Diff: docs/ml-survival-regression.md ---
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: Survival Regression - ML
-displayTitle: ML - Survival Regression
+title: Survival Regression - spark.ml
+displayTitle: Survival Regression - spark.ml
--- End diff --

This doc should now be a redirect to the 
ml-classification-regression.html#survival-regression section.

Also, it looks like some of the math renders incorrectly, but let's fix 
that in a follow-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10240#discussion_r47268739
  
--- Diff: 
core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala ---
@@ -91,23 +108,34 @@ private[memory] class ExecutionMemoryPool(
   val numActiveTasks = memoryForTask.keys.size
   val curMem = memoryForTask(taskAttemptId)
 
-  // How much we can grant this task; don't let it grow to more than 1 
/ numActiveTasks;
-  // don't let it be negative
-  val maxToGrant =
-math.min(numBytes, math.max(0, (poolSize / numActiveTasks) - 
curMem))
+  // In every iteration of this loop, we should first try to reclaim 
any borrowed execution
+  // space from storage. This is necessary because of the potential 
race condition where new
+  // storage blocks may steal the free execution memory that this task 
was waiting for.
+  maybeGrowPool(numBytes - memoryFree)
+
+  // Maximum size the pool would have after potentially growing the 
pool.
+  // This is used to compute the upper bound of how much memory each 
task can occupy. This
+  // must take into account potential free memory as well as the 
amount this pool currently
+  // occupies. Otherwise, we may run into SPARK-12155 where, in 
unified memory management,
+  // we did not take into account space that could have been freed by 
evicting cached blocks.
+  val maxPoolSize = computeMaxPoolSize()
+  val maxMemoryPerTask = maxPoolSize / numActiveTasks
+  val minMemoryPerTask = poolSize / (2 * numActiveTasks)
+
+  // How much we can grant this task; keep its share within 0 <= X <= 
1 / numActiveTasks
+  val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - 
curMem))
   // Only give it as much memory as is free, which might be none if it 
reached 1 / numTasks
   val toGrant = math.min(maxToGrant, memoryFree)
 
-  if (curMem < poolSize / (2 * numActiveTasks)) {
+  if (curMem < minMemoryPerTask) {
--- End diff --

yeah, I agree, though it's something we can always fix separately so we 
don't block the release. Let's defer the judgment to @JoshRosen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47268048
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala 
---
@@ -43,7 +43,7 @@ abstract class PlanTest extends SparkFunSuite {
   protected def comparePlans(plan1: LogicalPlan, plan2: LogicalPlan) {
 val normalized1 = normalizeExprIds(plan1)
 val normalized2 = normalizeExprIds(plan2)
-if (normalized1 != normalized2) {
+if (!normalized1.semanticEquals(normalized2)) {
--- End diff --

Existing: do we need this hacky normalization logic above anymore?  I don't 
think `semanticEquals` existed when I wrote this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-163720246
  
**[Test build #47533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47533/consoleFull)**
 for PR 10228 at commit 
[`3f60962`](https://github.com/apache/spark/commit/3f60962c2fd2f8f140714d0010dd0bb424b034b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11131] [core] Fix race in worker regist...

2015-12-10 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/9138#issuecomment-163719606
  
@vanzin just found an issue about this change. Now if the master receives 
`RegisterWorker`, it won't use the `workerRef` to send the reply. So there is 
no connection from `Master` to the server in `Worker`. If the `Worker` is 
killed now, `Master` only observes some client is lost, but the address is just 
a client address in Worker and won't match the Worker address. So `Master` 
cannot remove this dead `Worker` at once. However, this Worker will be removed 
in 60 seconds because of no heartbeat.

See the log here: 
https://www.mail-archive.com/dev@spark.apache.org/msg12332.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7727] [SQL] Avoid inner classes in Rule...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10174#discussion_r47270183
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/DefaultOptimizerExtendableSuite.scala
 ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.optimizer.{DefaultOptimizer, 
Optimizer}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Batch
+
+/**
+  * This is a test for SPARK-7727 if the Default Optimizer is kept being 
extendable
+  */
+class DefaultOptimizerExtendableSuite extends SparkFunSuite{
+
+  /**
+* This class represents a dummy extended optimizer that takes the 
rules of the
+* DefaultOptimizer and adds custom ones.
+*/
+  class ExtendedOptimizer extends Optimizer{
--- End diff --

Nit: space before `{`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163722742
  
**[Test build #47534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47534/consoleFull)**
 for PR 10052 at commit 
[`8a5a4f6`](https://github.com/apache/spark/commit/8a5a4f63fc66792e924f4f3355df357815aae13b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163722112
  
**[Test build #47534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47534/consoleFull)**
 for PR 10052 at commit 
[`8a5a4f6`](https://github.com/apache/spark/commit/8a5a4f63fc66792e924f4f3355df357815aae13b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163722357
  
**[Test build #47535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47535/consoleFull)**
 for PR 10234 at commit 
[`c75a5ca`](https://github.com/apache/spark/commit/c75a5ca68cec48574cddeb9f1cb8695b8d44e9ea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread thunterdb

Github user thunterdb commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163723975
  
@jkbradley done:
https://cloud.githubusercontent.com/assets/7594753/11725710/e9949f04-9f2f-11e5-8ba5-7f955e8b41fa.png;>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11923][ML] Python API for ml.feature.Ch...

2015-12-10 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10186#discussion_r47273371
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2093,6 +2093,95 @@ class RFormulaModel(JavaModel):
 """
 
 
+@inherit_doc
+class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, 
HasLabelCol):
+"""
+.. note:: Experimental
+
+Chi-Squared feature selection, which selects categorical features to 
use for predicting a
+categorical label.
+
+>>> from pyspark.mllib.linalg import Vectors
+>>> df = sqlContext.createDataFrame(
+...[(Vectors.dense([0.0, 0.0, 18.0, 1.0]), 1.0),
+...(Vectors.dense([0.0, 1.0, 12.0, 0.0]), 0.0),
+...(Vectors.dense([1.0, 0.0, 15.0, 0.1]), 0.0)],
+...["features", "label"])
+>>> selector = ChiSqSelector(numTopFeatures=1, 
outputCol="selectedFeatures")
+>>> model = selector.fit(df)
+>>> model.transform(df).collect()[0].selectedFeatures
+DenseVector([1.0])
+>>> model.transform(df).collect()[1].selectedFeatures
+DenseVector([0.0])
+>>> model.transform(df).collect()[2].selectedFeatures
+DenseVector([0.1])
+
+.. versionadded:: 1.6.0
+"""
+
+# a placeholder to make it appear in the generated doc
+numTopFeatures = \
+Param(Params._dummy(), "numTopFeatures",
+  "Number of features that selector will select, ordered by 
statistics value " +
+  "descending. If the number of features is < numTopFeatures, 
then this will select " +
+  "all features.")
+
+@keyword_only
+def __init__(self, numTopFeatures=50, featuresCol="features", 
outputCol=None, labelCol="label"):
+"""
+__init__(self, numTopFeatures=50, featuresCol="features", 
outputCol=None, labelCol="label")
+"""
+super(ChiSqSelector, self).__init__()
+self._java_obj = 
self._new_java_obj("org.apache.spark.ml.feature.ChiSqSelector", self.uid)
+self.numTopFeatures = \
+Param(self, "numTopFeatures",
+  "Number of features that selector will select, ordered 
by statistics value " +
+  "descending. If the number of features is < 
numTopFeatures, then this will " +
+  "select all features.")
+kwargs = self.__init__._input_kwargs
+self.setParams(**kwargs)
+
+@keyword_only
+@since("1.6.0")
+def setParams(self, numTopFeatures=50, featuresCol="features", 
outputCol=None,
+  labelCol="labels"):
+"""
+setParams(self, numTopFeatures=50, featuresCol="features", 
outputCol=None,\
+  labelCol="labels")
+Sets params for this ChiSqSelector.
+"""
+kwargs = self.setParams._input_kwargs
+return self._set(**kwargs)
+
+@since("1.6.0")
+def setNumTopFeatures(self, value):
+"""
+Sets the value of :py:attr:`numTopFeatures`.
+"""
+self._paramMap[self.numTopFeatures] = value
+return self
+
+@since("1.6.0")
+def getNumTopFeatures(self):
+"""
+Gets the value of numTopFeatures or its default value.
+"""
+return self.getOrDefault(self.numTopFeatures)
+
+def _create_model(self, java_model):
+return ChiSqSelectorModel(java_model)
+
+
+class ChiSqSelectorModel(JavaModel):
--- End diff --

This model is loadable and saveable in Java, I don't see us doing this 
elsewhere in ml/ yet (although we do it in mllib/) but do we maybe want to use 
the JavaLoader & JavaSaveable base classes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread dereksabryfb

Github user dereksabryfb commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163727440
  
Apologies, I haven't been able to run ./dev/run-tests, getting the 
following exception: http://pastebin.com/L0p0sjtJ

so I wasn't able to pick up the style issues, and I'm not sure if there's 
more that the build doesn't flag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163727576
  
**[Test build #47537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47537/consoleFull)**
 for PR 10234 at commit 
[`8432ac9`](https://github.com/apache/spark/commit/8432ac947a2ee469dbc4082a4fa702da82f44ebe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10240#issuecomment-163731070
  
**[Test build #47528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47528/consoleFull)**
 for PR 10240 at commit 
[`d8be669`](https://github.com/apache/spark/commit/d8be66911d2abf3da46a25a54a7d80fd1eeebdfa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10240#issuecomment-163731262
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47528/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163732061
  
**[Test build #47531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47531/consoleFull)**
 for PR 10209 at commit 
[`fb562fb`](https://github.com/apache/spark/commit/fb562fb67a761276456b14a81513f3fc69a6ead8).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163732157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47531/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12220][Core]Make Utils.fetchFile suppor...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10208#issuecomment-163733868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47529/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...

2015-12-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10204#issuecomment-163734579
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...

2015-12-10 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10204#issuecomment-163734657
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...

2015-12-10 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10236#issuecomment-163734799
  
The only change is to remove that `require`. I am merging it to master and 
branch 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10236#issuecomment-163716624
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47526/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12257][SQL] Non partitioned insert into...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10254#discussion_r47269588
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -155,6 +155,11 @@ case class InsertIntoHiveTable(
 val partitionColumns = 
fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns")
 val partitionColumnNames = 
Option(partitionColumns).map(_.split("/")).orNull
 
+// Validate that partition values are specified for partition columns.
+if (partitionColumnNames != null && partitionColumnNames.size > 0 && 
partitionSpec.size == 0) {
+  throw new SparkException(ErrorMsg.NEED_PARTITION_ERROR.getMsg)
--- End diff --

`AnalysisException` for anything that is thrown due to an invalid query.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10052#issuecomment-163722753
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-163724932
  
**[Test build #47536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47536/consoleFull)**
 for PR 10228 at commit 
[`a9eae30`](https://github.com/apache/spark/commit/a9eae303166d6c3ba1f80a22265482b9f4d0a525).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12248][CORE] Adds limits per cpu for me...

2015-12-10 Thread drcrallen

Github user drcrallen closed the pull request at:

https://github.com/apache/spark/pull/10232


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163726109
  
**[Test build #47535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47535/consoleFull)**
 for PR 10234 at commit 
[`c75a5ca`](https://github.com/apache/spark/commit/c75a5ca68cec48574cddeb9f1cb8695b8d44e9ea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `* 
more functionality for random forests: estimates of feature importance, as well 
as the predicted probability of each class (a.k.a. class conditional 
probabilities) for classification.`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10228#issuecomment-163726189
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47533/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10234#issuecomment-163726277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47535/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12149] [Web UI] Executor UI improvement...

2015-12-10 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10154#discussion_r47274898
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala 
---
@@ -33,11 +33,13 @@ private[ui] case class ExecutorSummaryInfo(
 rddBlocks: Int,
 memoryUsed: Long,
 diskUsed: Long,
+totalCores: Int,
--- End diff --

So the comment for this case class says it isn't used anymore - do we 
really need to update it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163732156
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12235][SPARKR] Enhance mutate() to supp...

2015-12-10 Thread felixcheung

Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/10220#issuecomment-163732757
  
Sure, I'll check. We were discussing a bit in SPARK-12235


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...

2015-12-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10204


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2750][WEB UI] Add https support to the ...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10238#issuecomment-163737331
  
**[Test build #47532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47532/consoleFull)**
 for PR 10238 at commit 
[`f6f1dab`](https://github.com/apache/spark/commit/f6f1dab2eede5147c2387efa4d02d92f6c7a5388).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2750][WEB UI] Add https support to the ...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10238#issuecomment-163737414
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47532/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2750][WEB UI] Add https support to the ...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10238#issuecomment-163737413
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10240#issuecomment-163738746
  
@davies please look at the final changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...

2015-12-10 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10240#issuecomment-163738626
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12258] [SQL] Hive Timestamp UDF is bind...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10249#issuecomment-163684431
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Make pyspark shell pythonstartup work under py...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10255#issuecomment-163684045
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11925] [ML] [PySpark] Add PySpark missi...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9908#issuecomment-163690187
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9695] [ML] Add random seed Param to ML ...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9158#issuecomment-163692088
  
  [Test build #47506 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47506/console)
 for   PR 9158 at commit 
[`9822a26`](https://github.com/apache/spark/commit/9822a26e0941a575387df03216e81d63f584eb57).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `class Pipeline(override val uid: String) extends 
Estimator[PipelineModel] with HasSeed `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10236#issuecomment-163691816
  
**[Test build #47526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47526/consoleFull)**
 for PR 10236 at commit 
[`e303d4c`](https://github.com/apache/spark/commit/e303d4ca88e1209d0eaf17a367deb52ee18f8717).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9695] [ML] Add random seed Param to ML ...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9158#issuecomment-163692307
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9695] [ML] Add random seed Param to ML ...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9158#issuecomment-163692308
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47506/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11978] [ML] Move dataset_example.py to ...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9957#issuecomment-163693631
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47514/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 616 matches

Mail list logo