date:20141224

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3788#issuecomment-68034841
  
  [Test build #24778 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24778/consoleFull)
 for   PR 3788 at commit 
[`d903529`](https://github.com/apache/spark/commit/d903529819f090288f6acfb666873f9ac01990be).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68035013
  
  [Test build #24773 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24773/consoleFull)
 for   PR 3778 at commit 
[`8c0316f`](https://github.com/apache/spark/commit/8c0316f8454f0ac8268f98d9a4c9cc29baedbf5b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class CombinePredicate extends BinaryPredicate `
  * `case class And(left: Expression, right: Expression) extends 
CombinePredicate `
  * `case class Or(left: Expression, right: Expression) extends 
CombinePredicate `
  * `  implicit class CombinePredicateExtension(source: CombinePredicate) `
  * `  implicit class ExpressionCookies(expression: Expression) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68035016
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24773/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68035062
  
  [Test build #24779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24779/consoleFull)
 for   PR 3784 at commit 
[`4ab3a58`](https://github.com/apache/spark/commit/4ab3a58fe8a86bc8f08fa0007d88022b3021e0e6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix a typo of type parameter in JavaUt...

2014-12-24 Thread sarutak

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/3789

[Minor] Fix a typo of type parameter in JavaUtils.scala

In JavaUtils.scala, thare is a typo of type parameter. In addition, the 
type information is removed at the time of compile by erasure.

This issue is really minor so I don't  file in JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark fix-typo-in-javautils

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3789.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3789


commit 99f6f6342b98156a5f7771b0dd0d50c4e0f21a8c
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-12-24T08:05:51Z

Fixed a typo of type parameter in JavaUtils.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix a typo of type parameter in JavaUt...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3789#issuecomment-68035307
  
  [Test build #24780 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24780/consoleFull)
 for   PR 3789 at commit 
[`99f6f63`](https://github.com/apache/spark/commit/99f6f6342b98156a5f7771b0dd0d50c4e0f21a8c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4723] [CORE] To abort the stages which ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3786#issuecomment-68035714
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24767/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4723] [CORE] To abort the stages which ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3786#issuecomment-68035711
  
  [Test build #24767 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24767/consoleFull)
 for   PR 3786 at commit 
[`003774a`](https://github.com/apache/spark/commit/003774ab2dea5c0f6fd70e68c385178cc235d1c2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949] [SPARK-4949] shutdownCallback in ...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3781#discussion_r22248980
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -31,16 +31,17 @@ private[spark] class SparkDeploySchedulerBackend(
   with AppClientListener
   with Logging {
 
-  var client: AppClient = null
-  var stopping = false
-  var shutdownCallback : (SparkDeploySchedulerBackend) = Unit = _
-  @volatile var appId: String = _
+  private var client: AppClient = null
+  private var stopping = false
+  private val shutdownCallbackLock = new Object()
+  private var shutdownCallback : (SparkDeploySchedulerBackend) = Unit = _
+  @volatile private var appId: String = _
 
-  val registrationLock = new Object()
-  var registrationDone = false
+  private val registrationLock = new Object()
--- End diff --

On the one hand, this sounds like it could be an `AtomicBoolean`. On 
another hand -- this whole mechanism could be replaced by something more robust 
in `java.util.concurrent`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949] [SPARK-4949] shutdownCallback in ...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3781#discussion_r22248985
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -82,8 +83,11 @@ private[spark] class SparkDeploySchedulerBackend(
 stopping = true
 super.stop()
 client.stop()
-if (shutdownCallback != null) {
-  shutdownCallback(this)
+
+shutdownCallbackLock.synchronized {
--- End diff --

This doesn't work since `shutdownCallbackLock` may be `null`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949] [SPARK-4949] shutdownCallback in ...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3781#discussion_r22249000
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -31,16 +31,17 @@ private[spark] class SparkDeploySchedulerBackend(
   with AppClientListener
   with Logging {
 
-  var client: AppClient = null
-  var stopping = false
-  var shutdownCallback : (SparkDeploySchedulerBackend) = Unit = _
-  @volatile var appId: String = _
+  private var client: AppClient = null
+  private var stopping = false
+  private val shutdownCallbackLock = new Object()
--- End diff --

Same for the new lock


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3787#issuecomment-68036584
  
  [Test build #24769 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24769/consoleFull)
 for   PR 3787 at commit 
[`264e4e0`](https://github.com/apache/spark/commit/264e4e0ce01e5f41eb60413249219ff98864dc0c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3787#issuecomment-68036586
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24769/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68036616
  
  [Test build #24774 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24774/consoleFull)
 for   PR 3319 at commit 
[`04c4829`](https://github.com/apache/spark/commit/04c4829d8364a36314485d6bdceed5ab93c67398).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68036621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24774/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2765#issuecomment-68036640
  
  [Test build #24770 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24770/consoleFull)
 for   PR 2765 at commit 
[`ce86bcc`](https://github.com/apache/spark/commit/ce86bcc5be8a790245787f75dfd2cba51ab50f55).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2765#issuecomment-68036643
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24770/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3787#discussion_r22249140
  
--- Diff: docs/building-spark.md ---
@@ -60,20 +60,29 @@ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests 
clean package
 mvn -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package
 {% endhighlight %}
 
-For Apache Hadoop 2.x, 0.23.x, Cloudera CDH, and other Hadoop versions 
with YARN, you can enable the yarn profile and optionally set the 
yarn.version property if it is different from hadoop.version. As of Spark 
1.3, Spark only supports YARN versions 2.2.0 and later.
+For Apache Hadoop 2.2.0 and later and Cloudera CDH 5 with YARN, you can 
enable the yarn profile and optionally set the yarn.version property if it 
is different from hadoop.version. As of Spark 1.3, Spark only supports YARN 
versions 2.2.0 and later.
 
 Examples:
 
 {% highlight bash %}
 # Apache Hadoop 2.2.X
-mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
+mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.X -DskipTests clean package
--- End diff --

This is wrong since 2.2.X is not a version. This is intended to be an 
executable example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3787#discussion_r22249130
  
--- Diff: docs/building-spark.md ---
@@ -60,20 +60,29 @@ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests 
clean package
 mvn -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package
 {% endhighlight %}
 
-For Apache Hadoop 2.x, 0.23.x, Cloudera CDH, and other Hadoop versions 
with YARN, you can enable the yarn profile and optionally set the 
yarn.version property if it is different from hadoop.version. As of Spark 
1.3, Spark only supports YARN versions 2.2.0 and later.
+For Apache Hadoop 2.2.0 and later and Cloudera CDH 5 with YARN, you can 
enable the yarn profile and optionally set the yarn.version property if it 
is different from hadoop.version. As of Spark 1.3, Spark only supports YARN 
versions 2.2.0 and later.
--- End diff --

This is not only applicable to CDH *5*+, so I'd revert that addition. What 
was removed with `yarn-alpha` was not really Hadoop 0.23 support, although it 
kind of lines up with that. Why not remove this whole qualifying For Apache 
Hadoop ... phrase altogether? Also, do you mean Spark 1.2? what are you 
referring to in 1.3 otherwise? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3787#issuecomment-68036986
  
  [Test build #24771 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24771/consoleFull)
 for   PR 3787 at commit 
[`9ab0c24`](https://github.com/apache/spark/commit/9ab0c24e440972ba861ceae75767847fbce96f91).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class ApplicationFinished(id: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3787#issuecomment-68036994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24771/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3787#discussion_r22249225
  
--- Diff: docs/building-spark.md ---
@@ -60,20 +60,29 @@ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests 
clean package
 mvn -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package
 {% endhighlight %}
 
-For Apache Hadoop 2.x, 0.23.x, Cloudera CDH, and other Hadoop versions 
with YARN, you can enable the yarn profile and optionally set the 
yarn.version property if it is different from hadoop.version. As of Spark 
1.3, Spark only supports YARN versions 2.2.0 and later.
+For Apache Hadoop 2.2.0 and later and Cloudera CDH 5 with YARN, you can 
enable the yarn profile and optionally set the yarn.version property if it 
is different from hadoop.version. As of Spark 1.3, Spark only supports YARN 
versions 2.2.0 and later.
 
 Examples:
 
 {% highlight bash %}
 # Apache Hadoop 2.2.X
-mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
+mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.X -DskipTests clean package
 
 # Apache Hadoop 2.3.X
-mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package
+mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.X -DskipTests clean package
 
 # Apache Hadoop 2.4.X or 2.5.X
 mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package
 
+# Cloudera CDH 5.0.X
+mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.0.X -DskipTests clean 
package
+
+# Cloudera CDH 5.1.X
+mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.1.X -DskipTests clean 
package
+
+# Cloudera CDEH 5.2.X or 5.3.X
--- End diff --

This has a typo in CDEH and are also not runnable. I don't see much value 
in elaborating this example 3 more times.

(As a related aside, I would like to see less, not more, vendor stuff in 
Spark anyway. Adding just this text unduly favors Cloudera a tiny bit; the 
alternative is to write a bunch of other vendor combos here, which is going to 
turn into at least a maintenance headache. I already disagree with maintaining 
vendor versioning info in the project POM.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4951][Core] Fix the issue that a busy e...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3783#issuecomment-68037376
  
  [Test build #24772 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24772/consoleFull)
 for   PR 3783 at commit 
[`105ba3a`](https://github.com/apache/spark/commit/105ba3acea521a77122a016faa6674793d1ff696).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4951][Core] Fix the issue that a busy e...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3783#issuecomment-68037381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24772/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix a typo of type parameter in JavaUt...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3789#discussion_r22249340
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala ---
@@ -80,7 +80,7 @@ private[spark] object JavaUtils {
   prev match {
 case Some(k) =
   underlying match {
-case mm: mutable.Map[a, _] =
+case mm: mutable.Map[_, _] =
--- End diff --

Should this really be `A` to express the relation to the generic bound? 
although `underlying` must already have keys of type `A` already. It just looks 
like that was the intent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68037960
  
  [Test build #24776 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24776/consoleFull)
 for   PR 3784 at commit 
[`3cf7937`](https://github.com/apache/spark/commit/3cf7937bf2c2631b3a313e5873d7f7d0b853203f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68037961
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24776/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4952][Core]Handle ConcurrentModificatio...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3788#discussion_r22249654
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -395,7 +395,7 @@ object SparkEnv extends Logging {
 val sparkProperties = (conf.getAll ++ schedulerMode).sorted
 
 // System properties that are not java classpaths
-val systemProperties = System.getProperties.iterator.toSeq
+val systemProperties = Utils.getSystemProperties.toSeq
--- End diff --

It wasn't clear to me at first whether this is the culprit, but it looks so 
since the underlying object being modified is a `java.util.Properties`. The 
defensive copy made in `Utils` should be thread-safe in the sense that 
`Hashtable.clone()` is `synchronized`. LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68038204
  
  [Test build #24775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24775/consoleFull)
 for   PR 3319 at commit 
[`b0354f6`](https://github.com/apache/spark/commit/b0354f616f7f49ee9b19f6b8e5d0dc775b05dba2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4409][MLlib] Additional Linear Algebra ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3319#issuecomment-68038211
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24775/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68038291
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24777/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68038290
  
  [Test build #24777 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24777/consoleFull)
 for   PR 3784 at commit 
[`0e51101`](https://github.com/apache/spark/commit/0e511019c6ee3faadd81f860adde5ee7bc6e4778).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4954][Core] add spark version infomatio...

2014-12-24 Thread liyezhang556520

GitHub user liyezhang556520 opened a pull request:

https://github.com/apache/spark/pull/3790

[SPARK-4954][Core] add spark version infomation in log for standalone mode

The master and worker spark version may be not the same with Driver spark 
version. That is because spark Jar file might be replaced for new application 
without restarting the spark cluster. So there shall log out the spark-version 
in both Mater and Worker log.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyezhang556520/spark version4Standalone

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3790.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3790


commit e05e1e3e0bf747adc1f0c3e4c6461e92b5368c23
Author: Zhang, Liye liye.zh...@intel.com
Date:   2014-12-24T08:46:06Z

add spark version infomation in log for standalone mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4954][Core] add spark version infomatio...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3790#issuecomment-68038435
  
  [Test build #24781 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24781/consoleFull)
 for   PR 3790 at commit 
[`e05e1e3`](https://github.com/apache/spark/commit/e05e1e3e0bf747adc1f0c3e4c6461e92b5368c23).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68039132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24779/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68039127
  
  [Test build #24779 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24779/consoleFull)
 for   PR 3784 at commit 
[`4ab3a58`](https://github.com/apache/spark/commit/4ab3a58fe8a86bc8f08fa0007d88022b3021e0e6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4952][Core]Handle ConcurrentModificatio...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3788#issuecomment-68039339
  
  [Test build #24778 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24778/consoleFull)
 for   PR 3788 at commit 
[`d903529`](https://github.com/apache/spark/commit/d903529819f090288f6acfb666873f9ac01990be).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4952][Core]Handle ConcurrentModificatio...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3788#issuecomment-68039346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24778/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3651#issuecomment-68039924
  
@JoshRosen I found that removing the `SPARK_HOME` config doesn't seem to 
matter; the REPL and YARN tests still pass. OK to remove that config in this 
PR, do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix a typo of type parameter in JavaUt...

2014-12-24 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/3789#discussion_r22250287
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala ---
@@ -80,7 +80,7 @@ private[spark] object JavaUtils {
   prev match {
 case Some(k) =
   underlying match {
-case mm: mutable.Map[a, _] =
+case mm: mutable.Map[_, _] =
--- End diff --

I thought it should be `A` but the type parameter at the position is no 
mean because the erasure removes type information at the time of compile.
Even though, should we place `A` for readability?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix a typo of type parameter in JavaUt...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3789#issuecomment-68040214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24780/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix a typo of type parameter in JavaUt...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3789#issuecomment-68040208
  
  [Test build #24780 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24780/consoleFull)
 for   PR 3789 at commit 
[`99f6f63`](https://github.com/apache/spark/commit/99f6f6342b98156a5f7771b0dd0d50c4e0f21a8c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3781#issuecomment-68040463
  
  [Test build #24782 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24782/consoleFull)
 for   PR 3781 at commit 
[`1b60fd1`](https://github.com/apache/spark/commit/1b60fd19bd0ef79e72fe568cf03d6976c7c32f97).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-24 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3755#issuecomment-68040630
  
Thanks for your suggestion too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3787#issuecomment-68041088
  
  [Test build #24783 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24783/consoleFull)
 for   PR 3787 at commit 
[`ee9c355`](https://github.com/apache/spark/commit/ee9c355dfc516dc612906be33c6baa9090cade0b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...

2014-12-24 Thread tsudukim

Github user tsudukim commented on the pull request:

https://github.com/apache/spark/pull/3467#issuecomment-68043299
  
Thank you for your comments! I'm going to do it in a few days!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4954][Core] add spark version infomatio...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3790#issuecomment-68043368
  
  [Test build #24781 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24781/consoleFull)
 for   PR 3790 at commit 
[`e05e1e3`](https://github.com/apache/spark/commit/e05e1e3e0bf747adc1f0c3e4c6461e92b5368c23).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4954][Core] add spark version infomatio...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3790#issuecomment-68043372
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24781/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3781#issuecomment-68045308
  
  [Test build #24782 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24782/consoleFull)
 for   PR 3781 at commit 
[`1b60fd1`](https://github.com/apache/spark/commit/1b60fd19bd0ef79e72fe568cf03d6976c7c32f97).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3781#issuecomment-68045313
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24782/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4951][Core] Fix the issue that a busy e...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3783#issuecomment-68045928
  
  [Test build #24784 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24784/consoleFull)
 for   PR 3783 at commit 
[`05f6238`](https://github.com/apache/spark/commit/05f6238e988a54aada24ce85272212717fdc8c4e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3787#issuecomment-68045940
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24783/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3787#issuecomment-68045936
  
  [Test build #24783 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24783/consoleFull)
 for   PR 3787 at commit 
[`ee9c355`](https://github.com/apache/spark/commit/ee9c355dfc516dc612906be33c6baa9090cade0b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3629][Doc] improve spark on yarn doc

2014-12-24 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/2813#issuecomment-68045956
  
@ssjssh could you reorganize your PR to 2 commits: one for the addition or 
modification, the other for moving the text.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4951][Core] Fix the issue that a busy e...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3783#issuecomment-68050039
  
  [Test build #24784 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24784/consoleFull)
 for   PR 3783 at commit 
[`05f6238`](https://github.com/apache/spark/commit/05f6238e988a54aada24ce85272212717fdc8c4e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4951][Core] Fix the issue that a busy e...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3783#issuecomment-68050042
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24784/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4858] Add an option to turn off a progr...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3709#issuecomment-68050938
  
  [Test build #24785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24785/consoleFull)
 for   PR 3709 at commit 
[`0681403`](https://github.com/apache/spark/commit/06814035d454a9b7e444b0dc657a572a6ae2f899).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4858] Add an option to turn off a progr...

2014-12-24 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/3709#issuecomment-68050965
  
Fixed, please test it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4723] [CORE] To abort the stages which ...

2014-12-24 Thread markhamstra

Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/3786#issuecomment-68051186
  
I don't like the approach of saying for some reason, something happens 
and then putting in a patch to address what happens instead of identifying and 
correcting the reason that it happens.  If anything, patching the effect in 
that way can make identifying the underlying cause more difficult.

Maybe we'll end up using something like `maxStageRetryAttempts`, but I 
don't want to do so until after we clearly understand why that is needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4950] Delete obsolete mapReduceTripelet...

2014-12-24 Thread ankurdave

Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/3782#issuecomment-68052655
  
We wanted to retain binary compatibility for the Pregel API, which 
prevented adding the TripletFields parameter. Instead it might be better to add 
a second version of the Pregel API with several changes: manually-specified 
TripletFields, aggregateMessages-style API, and custom vertex activeness 
(#1217).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4386] Improve performance when writing ...

2014-12-24 Thread MickDavies

Github user MickDavies commented on the pull request:

https://github.com/apache/spark/pull/3254#issuecomment-68053039
  
@jimfcarroll - that's exactly the change I made. Performance improvements 
are very substantial for wide tables, as I said in the case I was looking at 6x 
as fast, but more significant still if you just consider just processing in 
Spark. Thanks for checking in the improvement.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4858] Add an option to turn off a progr...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3709#issuecomment-68055149
  
  [Test build #24785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24785/consoleFull)
 for   PR 3709 at commit 
[`0681403`](https://github.com/apache/spark/commit/06814035d454a9b7e444b0dc657a572a6ae2f899).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4858] Add an option to turn off a progr...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3709#issuecomment-68055152
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24785/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Vectors.sparse() add support to unsorted indic...

2014-12-24 Thread hzlyx

GitHub user hzlyx opened a pull request:

https://github.com/apache/spark/pull/3791

Vectors.sparse() add support to unsorted indices

For original method, when the indices is not strictly increasing, the 
sparse vector can be created without any warning or error. But when use apply() 
method, only zero will be returned.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hzlyx/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3791.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3791


commit fa3df2406cea3b73e3058510d0a6c5e4098dc22a
Author: Yuxi Liao liaoy...@huawei.com
Date:   2014-12-24T14:54:15Z

Vectors.sparse() add support to unsorted indices

For original method, when the indices is not strictly increasing, the 
sparse vector can be created without any warning or error. But when use apply() 
method, only zero will be returned.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3791#discussion_r22257875
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -173,11 +173,13 @@ object Vectors {
* Creates a sparse vector providing its index array and value array.
*
* @param size vector size.
-   * @param indices index array, must be strictly increasing.
-   * @param values value array, must have the same length as indices.
+   * @param indices index array.
+   * @param values value array.
*/
-  def sparse(size: Int, indices: Array[Int], values: Array[Double]): 
Vector =
-new SparseVector(size, indices, values)
+  def sparse(size: Int, indices: Array[Int], values: Array[Double]): 
Vector = {
+val (newIndices, newValues) = indices.zip(values).sortBy(_._1).unzip
--- End diff --

This is non-trivial overhead to introduce every time a vector is made, when 
the common case is that the indices are sorted, and the other cases are really 
caller error. I'd still suggest merely checking the sorting. There are lots of 
one-liners but this may be among the most efficient `require((1 until 
indices.length).forall(i = indices(i-1) = indices(i)))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3791#issuecomment-68057633
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

Github user lianhuiwang closed the pull request at:

https://github.com/apache/spark/pull/3245


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

Github user lianhuiwang commented on the pull request:

https://github.com/apache/spark/pull/3245#issuecomment-68059290
  
ok, i will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4195][Core]retry to fetch blocks's resu...

Github user lianhuiwang commented on the pull request:

https://github.com/apache/spark/pull/3061#issuecomment-68059363
  
OK. I will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4195][Core]retry to fetch blocks's resu...

Github user lianhuiwang closed the pull request at:

https://github.com/apache/spark/pull/3061


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-24 Thread ganonp

Github user ganonp commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68062042
  
Sorry I didn't mean to commit that norm method for this pull request. That 
said, I think it makes sense for norm to be public or at least a d=2 version of 
norm.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4454 Fix race condition in DAGScheduler

2014-12-24 Thread markhamstra

Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/3345#issuecomment-68068199

Ah yes, I see now. Thanks for coming back to this one, Josh.

`DAGScheduler#getPreferredLocs` is definitely broken. You're correct that
the access to and potential update of the `cacheLocs` needs to be routed
through the actor. But because of the need to return the preferred locations,
this will be a little different than the fire-and-forget messages that are
currently sent to the `eventProcessActor`, and will need to be an
[`ask`](http://doc.akka.io/docs/akka/2.3.4/scala/actors.html#Ask__Send-And-Receive-Future)
pattern instead.

Something that also concerns me in looking at the usages of
`SparkContext#getPreferredLocs` in `CoalescedRDD` and
`PartitionerAwareUnionRDD` is that they both have a `currPrefLocs` method with
a comment that this is supposed to Get the *current* preferred locations from
the DAGScheduler. I'm not sure just what the expectation or requirement there
for current is -- current when the RDD is defined, when actions are run on
it, something else? This feels like a potential race condition to me, and I am
wondering whether it might make sense to make this getting of current preferred
locations as lazy as possible and resolved during the execution of a job.
That's just speculation as to the need for or desirability of that laziness,
but I think it deserves a look.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2014-12-24 Thread markhamstra

Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/3632#issuecomment-68069421
  
The reason for separate classes is to cleanly segregate the 
available/supportable functionality.  Not every `PairRDD` has keys that can be 
ordered, so `sortByKey` shouldn't be part of `PairRDD`.  When keys can be 
ordered, there is often a natural ordering that is already implicitly in scope. 
 When that is true, then we don't want to force the user to explicitly provide 
an `Ordering` -- e.g. if you have an `RDD[Int, Foo]`, then rdd.sortByKey() 
should just work.  If you want a different Ordering, then you just need to 
bring a new implicit Ordering for that key type into scope.

Things aren't as cleanly separated in the Java API because of the lack of 
support for implicits there, but that doesn't mean that we should abandon the 
separation between `PairRDD` and `OrderedRDD` on the Scala side or start 
dirtying-up `PairRDD.scala` when we want to provide new methods for RDDs whose 
keys and values can both be ordered.

I really think that we want to repeat the pattern of `OrderedRDD` for these 
`DoublyOrderedRDD` -- or whatever better name you can come up with.  The 
biggest quirk I can see right now is if the types of both keys and values are 
the same but you want to order them one way when sorting by key and a different 
way when doing the secondary sort on values.  That won't work with implicits 
since there can only be one implicit `Ordering` for the type in scope at a 
time.  The problem could either be avoided by using distinct types for the key 
and value roles, or a method signature with explicit orderings could be added 
to address this corner case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added Java serialization util functions back i...

2014-12-24 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/3792

Added Java serialization util functions back in 
network/common/util/JavaUtils



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark java-ser

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3792


commit 2a2ad9d6fcb98bbb7ffca1c0a5273f4ff8cb53a6
Author: Reynold Xin r...@databricks.com
Date:   2014-12-24T19:24:31Z

Added Java serialization util functions back in 
network/common/util/JavaUtils.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix a typo of type parameter in JavaUt...

2014-12-24 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3789#discussion_r22262419
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala ---
@@ -80,7 +80,7 @@ private[spark] object JavaUtils {
   prev match {
 case Some(k) =
   underlying match {
-case mm: mutable.Map[a, _] =
+case mm: mutable.Map[_, _] =
--- End diff --

yea for readability let's use A


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added Java serialization util functions back i...

2014-12-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3792#issuecomment-68070712
  
cc @aarondav


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added Java serialization util functions back i...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3792#issuecomment-68070795
  
  [Test build #24786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24786/consoleFull)
 for   PR 3792 at commit 
[`2a2ad9d`](https://github.com/apache/spark/commit/2a2ad9d6fcb98bbb7ffca1c0a5273f4ff8cb53a6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added Java serialization util functions back i...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3792#discussion_r22262724
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/util/JavaUtils.java ---
@@ -41,6 +41,34 @@
 public class JavaUtils {
   private static final Logger logger = 
LoggerFactory.getLogger(JavaUtils.class);
 
+  /** Deserialize a byte array using Java serialization. */
+  public static T T deserialize(byte[] bytes) {
+try {
+  ObjectInputStream is = new ObjectInputStream(new 
ByteArrayInputStream(bytes));
+  Object out = is.readObject();
+  is.close();
+  return (T) out;
+} catch (ClassNotFoundException e) {
+  throw new RuntimeException(Could not deserialize object, e);
--- End diff --

Yeah pretty standard formulation. Nit suggestion: don't throw general 
`RuntimeException` but something marginally more specific like 
`IllegalStateException`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4877] Allow user first classes to exten...

2014-12-24 Thread stephenh

Github user stephenh commented on the pull request:

https://github.com/apache/spark/pull/3725#issuecomment-68071870
  
Cool, sounds good. FWIW there are few things to do after this gets in:

a) document that if userClassPathFirst=true, then user's uberjar should not 
include any Spark or Scala code (or else they'll get class cast exceptions b/c 
the parent scala.Function will be different from the child scala.Function),

b) either accept Marcelo's PR as-is (which, among other things, applies the 
user-first classloader to driver code) or pull out just the driver part of his 
PR until the rest gets in (I've done this for our local Spark build),

c) as a few others have said, adapt the filtering logic from Jetty/Hadoop 
that will prefer scala.* and org.apache.spark.* (and a few others) from the 
parent classloader all the time, even if the user's uberjar does accidentally 
include them (at this point, the documentation added in a) could be removed).

I included these in order of small - large, with the idea that, unless 
someone beats me to it (which would be great :-)), I'll progressively work 
through each one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0

2014-12-24 Thread nchammas

GitHub user nchammas opened a pull request:

https://github.com/apache/spark/pull/3793

[EC2] Update default Spark version to 1.2.0

Now that 1.2.0 is out, let's update the default Spark version.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nchammas/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3793.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3793


commit ec0e904608eaa65bbbf35b2558a0116387abaecf
Author: Nicholas Chammas nicholas.cham...@gmail.com
Date:   2014-12-24T20:10:02Z

[EC2] Update default Spark version to 1.2.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0

2014-12-24 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3793#issuecomment-68072359
  
cc @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3793#issuecomment-68072413
  
  [Test build #24787 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24787/consoleFull)
 for   PR 3793 at commit 
[`ec0e904`](https://github.com/apache/spark/commit/ec0e904608eaa65bbbf35b2558a0116387abaecf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3793#issuecomment-68073160
  
It looks like this was already done in `branch-1.2`, but it doesn't hurt to 
do it in master: 
https://github.com/apache/spark/commit/dfb8c65b730fdf60540e91cd74fbaa2764a2a2bc

If it's not already there, we should add this to the preparing a release 
checklist on the wiki.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4890] Upgrade Boto to 2.34.0; automatic...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3737#issuecomment-68073531
  
@nchammas Thanks for raising those concerns.  The `--help` issue might not 
be too hard to fix (we may be able to do some lazy-loading of `boto`).  For 
read-only mounts, I don't see a great solution: I don't want to continue 
bundling a zip file in the Spark source, since the boto download is huge (even 
after compression).  Maybe we could package it when making binary 
distributions, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0

2014-12-24 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3793#issuecomment-68073619
  
Hmm, master is [already on 
1.3.0](https://github.com/apache/spark/blob/199e59aacd540e17b31f38e0e32a3618870e9055/docs/_config.yml#L16)
 in that config file in dfb8c65.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3707#discussion_r22263467
  
--- Diff: build/mvn ---
@@ -0,0 +1,130 @@
+#!/usr/bin/env bash
+
+# Determine the current working directory
+_DIR=$( cd $( dirname ${BASH_SOURCE[0]} )  pwd )
+
+# Installs any application tarball given a URL, the expected tarball name,
+# and, optionally, a checkable binary path to determine if the binary has
+# already been installed
+## Arg1 - URL
+## Arg2 - Tarball Name
+## Arg3 - Checkable Binary
+install_app() {
+  local remote_tarball=$1/$2
+  local local_tarball=${_DIR}/$2
+  local binary=${_DIR}/$3
+
+  # setup `curl` and `wget` silent options if we're running on Jenkins
+  local curl_opts=
+  local wget_opts=
+  if [ -n $AMPLAB_JENKINS ]; then
+curl_opts=-s
+wget_opts=--quiet
+  else
+curl_opts=--progress-bar
+wget_opts=--progress=bar:force
+  fi
+
+  if [ -z $3 -o ! -f $binary ]; then
+# check if we already have the tarball
+# check if we have curl installed
+# download application
+[ ! -f ${local_tarball} ]  [ -n `which curl 2/dev/null` ]  \
+  echo exec: curl ${curl_opts} ${remote_tarball}  \
+  curl ${curl_opts} ${remote_tarball}  ${local_tarball}
+# if the file still doesn't exist, lets try `wget` and cross our 
fingers
+[ ! -f ${local_tarball} ]  [ -n `which wget 2/dev/null` ]  \
+  echo exec: wget ${wget_opts} ${remote_tarball}  \
+  wget ${wget_opts} -O ${local_tarball} ${remote_tarball}
+# if both were unsuccessful, exit
+[ ! -f ${local_tarball} ]  \
+  echo -n ERROR: Cannot download $2 with cURL or wget;   \
+  echo please install manually and try again.  \
+  exit 2
+cd ${_DIR}  tar -xzf $2
+rm -rf $local_tarball
+  fi
+}
+
+# Install maven under the build/ folder
+install_mvn() {
+  install_app \
+http://apache.claz.org/maven/maven-3/3.2.3/binaries; \
+apache-maven-3.2.3-bin.tar.gz \
+apache-maven-3.2.3/bin/mvn
+  MVN_BIN=${_DIR}/apache-maven-3.2.3/bin/mvn
+}
+
+# Install zinc under the build/ folder
+install_zinc() {
+  local zinc_path=zinc-0.3.5.3/bin/zinc
+  [ ! -f ${zinc_path} ]  ZINC_INSTALL_FLAG=1
+  install_app \
+http://downloads.typesafe.com/zinc/0.3.5.3; \
+zinc-0.3.5.3.tgz \
+${zinc_path}
+  ZINC_BIN=${_DIR}/${zinc_path}
+}
+
+# Determine the Scala version from the root pom.xml file, set the Scala 
URL,
+# and, with that, download the specific version of Scala necessary under
+# the build/ folder
+install_scala() {
+  # determine the Scala version used in Spark
+  local scala_version=`grep scala.version ${_DIR}/../pom.xml | \
--- End diff --

That would probably be less brittle, but I guess it would introduce another 
dependency which we'd have to install in this script (since we want it to be a 
one-click installer).  Since `xmlstarlet` binaries aren't portable, the 
installation logic might be complex since we'd need to have some 
platform-specific logic to download the right binaries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added Java serialization util functions back i...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3792#issuecomment-68073817
  
  [Test build #24786 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24786/consoleFull)
 for   PR 3792 at commit 
[`2a2ad9d`](https://github.com/apache/spark/commit/2a2ad9d6fcb98bbb7ffca1c0a5273f4ff8cb53a6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added Java serialization util functions back i...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3792#issuecomment-68073819
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24786/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3651#issuecomment-68074101
  
I did a quick `git grep` through the codebase to find uses of `SPARK_HOME` 
and it looks like there's only a few places where it's read:

SparkContext, which is a fallback if `spark.home` is not set:

```
core/src/main/scala/org/apache/spark/SparkContext.scala-   * Get Spark's 
home location from either a value set through the constructor,
core/src/main/scala/org/apache/spark/SparkContext.scala-   * or the 
spark.home Java property, or the SPARK_HOME environment variable
core/src/main/scala/org/apache/spark/SparkContext.scala-   * (in that order 
of preference). If neither of these is set, return None.
core/src/main/scala/org/apache/spark/SparkContext.scala-   */
core/src/main/scala/org/apache/spark/SparkContext.scala-  private[spark] 
def getSparkHome(): Option[String] = {
core/src/main/scala/org/apache/spark/SparkContext.scala:
conf.getOption(spark.home).orElse(Option(System.getenv(SPARK_HOME)))
core/src/main/scala/org/apache/spark/SparkContext.scala-  }
core/src/main/scala/org/apache/spark/SparkContext.scala-
core/src/main/scala/org/apache/spark/SparkContext.scala-  /**
core/src/main/scala/org/apache/spark/SparkContext.scala-   * Set the 
thread-local property for overriding the call sites
core/src/main/scala/org/apache/spark/SparkContext.scala-   * of actions and 
RDDs.
```

PythonUtils, with no fallback:

```
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-

core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-private[spark]
 object PythonUtils {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-  /** Get 
the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or from our 
JAR */
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-  def 
sparkPythonPath: String = {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-val 
pythonPath = new ArrayBuffer[String]
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala:for 
(sparkHome - sys.env.get(SPARK_HOME)) {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-  
pythonPath += Seq(sparkHome, python).mkString(File.separator)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-  
pythonPath += Seq(sparkHome, python, lib, 
py4j-0.8.2.1-src.zip).mkString(File.separator)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-}
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-
pythonPath ++= SparkContext.jarOfObject(this)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-
pythonPath.mkString(File.pathSeparator)
```

FaultToleranceTest, which isn't actually run in our tests (since it needs a 
bunch of manual Docker setup to work):

```
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  val 
zk =  SparkCuratorUtil.newClient(conf)
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  var 
numPassed = 0
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  var 
numFailed = 0
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala:  val 
sparkHome = System.getenv(SPARK_HOME)
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  
assertTrue(sparkHome != null, Run with a valid SPARK_HOME)
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  val 
containerSparkHome = /opt/spark
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  val 
dockerMountDir = %s:%s.format(sparkHome, containerSparkHome)
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
```

SparkSubmitArguments, which uses this without a fallback:

```
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-   */
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-  
private def mergeSparkProperties(): Unit = {
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
// Use common defaults file, if not specified by user
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
if (propertiesFile == null) {
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala- 
 val sep = File.separator
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: 
 val sparkHomeConfig = env.get(SPARK_HOME).map(sparkHome = 
s${sparkHome}${sep}conf)
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-

[GitHub] spark pull request: [SPARK-3398] [SPARK-4325] [EC2] Use EC2 status...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3195#issuecomment-68074192
  
Doh!  Unfortunately, there's no way for us to go back and edit commit 
messages without messing up the git history.  In the future, I'll be more 
careful when verifying commit message before merging (I should make a review 
checklist for these sorts of steps...)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2854#issuecomment-68074240
  
Since it's not obvious what's failing, I guess I'll have to log into 
Jenkins and look at the logs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Improve LiveList...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3710#discussion_r22263727
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/StreamingListenerBus.scala
 ---
@@ -64,36 +71,40 @@ private[spark] class StreamingListenerBus() extends 
Logging {
   }
 
   def addListener(listener: StreamingListener) {
-listeners += listener
+listeners.add(listener)
   }
 
   def post(event: StreamingListenerEvent) {
+if (stopped) {
+  // Drop further events to make `StreamingListenerShutdown` be 
delivered ASAP
+  logError(StreamingListenerBus has been stopped! Drop  + event)
+  return
+}
 val eventAdded = eventQueue.offer(event)
-if (!eventAdded  !queueFullErrorMessageLogged) {
+if (!eventAdded  queueFullErrorMessageLogged.compareAndSet(false, 
true)) {
   logError(Dropping StreamingListenerEvent because no remaining room 
in event queue.  +
 This likely means one of the StreamingListeners is too slow and 
cannot keep up with the  +
 rate at which events are being started by the scheduler.)
-  queueFullErrorMessageLogged = true
 }
   }
 
-  /**
-   * Waits until there are no more events in the queue, or until the 
specified time has elapsed.
-   * Used for testing only. Returns true if the queue has emptied and 
false is the specified time
-   * elapsed before the queue emptied.
-   */
-  def waitUntilEmpty(timeoutMillis: Int): Boolean = {
-val finishTime = System.currentTimeMillis + timeoutMillis
-while (!eventQueue.isEmpty) {
-  if (System.currentTimeMillis  finishTime) {
-return false
+  def stop(): Unit = {
+stopped = true
+// Should not call `post`, or `StreamingListenerShutdown` may be 
dropped.
+eventQueue.put(StreamingListenerShutdown)
+listenerThread.join()
+  }
+
+  private def foreachListener(f: StreamingListener = Unit): Unit = {
+val iter = listeners.iterator
--- End diff --

Nit: Can you add a comment mentioning why you used an iterator so that this 
does not regress in the future. This is quite subtle. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3651#issuecomment-68074359
  
@JoshRosen Yes, sounds about right to me. I rebased and pushed one more 
commit to remove special `SPARK_HOME` setting in these modules too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Improve LiveList...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3710#discussion_r22263766
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/StreamingListenerBus.scala
 ---
@@ -64,36 +71,40 @@ private[spark] class StreamingListenerBus() extends 
Logging {
   }
 
   def addListener(listener: StreamingListener) {
-listeners += listener
+listeners.add(listener)
   }
 
   def post(event: StreamingListenerEvent) {
+if (stopped) {
+  // Drop further events to make `StreamingListenerShutdown` be 
delivered ASAP
+  logError(StreamingListenerBus has been stopped! Drop  + event)
+  return
+}
 val eventAdded = eventQueue.offer(event)
-if (!eventAdded  !queueFullErrorMessageLogged) {
+if (!eventAdded  queueFullErrorMessageLogged.compareAndSet(false, 
true)) {
   logError(Dropping StreamingListenerEvent because no remaining room 
in event queue.  +
 This likely means one of the StreamingListeners is too slow and 
cannot keep up with the  +
 rate at which events are being started by the scheduler.)
-  queueFullErrorMessageLogged = true
 }
   }
 
-  /**
-   * Waits until there are no more events in the queue, or until the 
specified time has elapsed.
-   * Used for testing only. Returns true if the queue has emptied and 
false is the specified time
-   * elapsed before the queue emptied.
-   */
-  def waitUntilEmpty(timeoutMillis: Int): Boolean = {
-val finishTime = System.currentTimeMillis + timeoutMillis
-while (!eventQueue.isEmpty) {
-  if (System.currentTimeMillis  finishTime) {
-return false
+  def stop(): Unit = {
+stopped = true
+// Should not call `post`, or `StreamingListenerShutdown` may be 
dropped.
+eventQueue.put(StreamingListenerShutdown)
+listenerThread.join()
+  }
+
+  private def foreachListener(f: StreamingListener = Unit): Unit = {
+val iter = listeners.iterator
--- End diff --

If this change is to avoid an implicit Java - Scala collections 
conversion, why not replace the `JavaConversions` implicits with the more 
explicit `JavaConverters` instead, so that you have to manually write `.asJava` 
or `.asScala`?  That, in addition to a comment, would make it more obvious if 
we're re-introducing those conversions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Improve LiveList...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3710#discussion_r22263767
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/StreamingListenerBus.scala
 ---
@@ -64,36 +71,40 @@ private[spark] class StreamingListenerBus() extends 
Logging {
   }
 
   def addListener(listener: StreamingListener) {
-listeners += listener
+listeners.add(listener)
   }
 
   def post(event: StreamingListenerEvent) {
+if (stopped) {
+  // Drop further events to make `StreamingListenerShutdown` be 
delivered ASAP
+  logError(StreamingListenerBus has been stopped! Drop  + event)
+  return
+}
 val eventAdded = eventQueue.offer(event)
-if (!eventAdded  !queueFullErrorMessageLogged) {
+if (!eventAdded  queueFullErrorMessageLogged.compareAndSet(false, 
true)) {
   logError(Dropping StreamingListenerEvent because no remaining room 
in event queue.  +
 This likely means one of the StreamingListeners is too slow and 
cannot keep up with the  +
 rate at which events are being started by the scheduler.)
-  queueFullErrorMessageLogged = true
 }
   }
 
-  /**
-   * Waits until there are no more events in the queue, or until the 
specified time has elapsed.
-   * Used for testing only. Returns true if the queue has emptied and 
false is the specified time
-   * elapsed before the queue emptied.
-   */
-  def waitUntilEmpty(timeoutMillis: Int): Boolean = {
-val finishTime = System.currentTimeMillis + timeoutMillis
-while (!eventQueue.isEmpty) {
-  if (System.currentTimeMillis  finishTime) {
-return false
+  def stop(): Unit = {
+stopped = true
+// Should not call `post`, or `StreamingListenerShutdown` may be 
dropped.
+eventQueue.put(StreamingListenerShutdown)
--- End diff --

Why is this `put` still there? Wouldnt this block / throw error if the 
queue is full?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3651#issuecomment-68074482
  
  [Test build #24788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24788/consoleFull)
 for   PR 3651 at commit 
[`2e8a0af`](https://github.com/apache/spark/commit/2e8a0afeef77df8fd9c7df406812878b22c67aa7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Improve LiveList...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3710#discussion_r22263810
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/StreamingListenerBus.scala
 ---
@@ -39,18 +45,19 @@ private[spark] class StreamingListenerBus() extends 
Logging {
 val event = eventQueue.take
--- End diff --

This does not use `stopped` like the `LiveListenerBus`. I know that 
introducing `eventLock` and using `eventQueue.poll` instead of 
`eventQueue.take` like the LiveListenerBus is too much for this PR. But at 
least we can eliminate the bug related to `StreamingListenerShutdown` by using 
`stopped` instead of `StreamingListenerShutdown`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Improve LiveList...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3710#discussion_r22263834
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
@@ -35,8 +36,9 @@ private[spark] class LiveListenerBus extends 
SparkListenerBus with Logging {
* an OOM exception) if it's perpetually being added to more quickly 
than it's being drained. */
--- End diff --

Please update the comment about `SparkListenerShutdown` in the 
documentation of this class. Also we should probably removed the declaration of 
`SparkListenerShutdown` and any references to it. git-grep to check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Improve LiveList...