[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/10444#discussion_r48332590
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -47,6 +48,34 @@ trait Predicate extends Expression {
   override def dataType: DataType = BooleanType
 }
 
+object Predicate extends PredicateHelper {
+  def toCNF(predicate: Expression, maybeThreshold: Option[Double] = None): 
Expression = {
+val cnf = new CNFExecutor(predicate).execute(predicate)
+val threshold = maybeThreshold.map(predicate.size * 
_).getOrElse(Double.MaxValue)
+if (cnf.size > threshold) predicate else cnf
--- End diff --

When the threshold is exceeded, the original predicate rather than the 
intermediate converted predicate is returned. This because the intermediate 
result may not be in CNF, thus:

1. It doesn't bring much benefit for filter push-down, and
2. It's much larger than the original predicate and brings extra evaluation 
cost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166850088
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48235/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166850081
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166850935
  
**[Test build #48236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48236/consoleFull)**
 for PR 10444 at commit 
[`031278f`](https://github.com/apache/spark/commit/031278f9b8e8177640b3b6728a058eb153b9a2b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166852843
  
cc @nongli @rxin @marmbrus @yjshen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][SQL]Add CNF Normalization as part...

2015-12-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/8200#discussion_r48333880
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -464,6 +465,24 @@ object OptimizeIn extends Rule[LogicalPlan] {
 }
 
 /**
+  * Convert an expression into its conjunctive normal form (CNF), i.e. AND 
of ORs.
+  * For example, a && b || c is normalized to (a || c) && (b || c) by this 
method.
+  *
+  * Refer to https://en.wikipedia.org/wiki/Conjunctive_normal_form for 
more information
+  */
+object CNFNormalization extends Rule[LogicalPlan] {
--- End diff --

(I wonder how traditional RDBMS copes with the CNF exponential expansion 
issue?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-10625] [SQL] Spark SQL JDBC read/write ...

2015-12-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8785#issuecomment-166854056
  
@tribbloid let's wrap this up at last. Can you please review all the 
outstanding comments and address them or else close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add more exceptions to Guava relocation

2015-12-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10442#issuecomment-166853471
  
Why does this need to be added? 
The Guava exceptions are going away in 2.x anyway.
This also doesn't follow 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add more exceptions to Guava relocation

2015-12-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10442#discussion_r48333932
  
--- Diff: pom.xml ---
@@ -99,14 +99,14 @@
 sql/hive
 unsafe
 assembly
-external/twitter
-external/flume
-external/flume-sink
-external/flume-assembly
-external/mqtt
-external/mqtt-assembly
-external/zeromq
-examples
+
--- End diff --

Most of these changes you've made can't be merged. Do you mind closing this 
PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48334101
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -107,7 +135,7 @@ private[spark] object SQLConf {
 isPublic: Boolean = true): SQLConfEntry[Long] =
   SQLConfEntry(key, defaultValue, { v =>
 try {
-  v.toLong
+  Utils.byteStringAsBytes(v)
--- End diff --

This is still wrong


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48334129
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
--- End diff --

The spacing and alignment are wrong throughout this method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48334144
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
+key: String,
+defaultValue: Option[Int] = None,
+doc: String = "",
+isPublic: Boolean = true): SQLConfEntry[Int] =
+  SQLConfEntry(key, defaultValue, { v =>
+var isNegative: Boolean = false
--- End diff --

Why `var`? assign the result of the try to a `val`. No need for a type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48334160
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
+key: String,
+defaultValue: Option[Int] = None,
+doc: String = "",
+isPublic: Boolean = true): SQLConfEntry[Int] =
+  SQLConfEntry(key, defaultValue, { v =>
+var isNegative: Boolean = false
+try {
+  isNegative = (v.toInt < 0)
+} catch {
+  case _: Throwable =>
--- End diff --

Why `Throwable`? I assume you mean `NumberFormatException`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48334198
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
+key: String,
+defaultValue: Option[Int] = None,
+doc: String = "",
+isPublic: Boolean = true): SQLConfEntry[Int] =
+  SQLConfEntry(key, defaultValue, { v =>
+var isNegative: Boolean = false
+try {
+  isNegative = (v.toInt < 0)
+} catch {
+  case _: Throwable =>
+}
+if (!isNegative) {
--- End diff --

It's clearer to flip this conditional.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48334213
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
+key: String,
+defaultValue: Option[Int] = None,
+doc: String = "",
+isPublic: Boolean = true): SQLConfEntry[Int] =
+  SQLConfEntry(key, defaultValue, { v =>
+var isNegative: Boolean = false
+try {
+  isNegative = (v.toInt < 0)
+} catch {
+  case _: Throwable =>
+}
+if (!isNegative) {
+  if ((Utils.byteStringAsBytes(v) <= Int.MaxValue.toLong) &&
--- End diff --

Why compute this 3 times?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-166855061
  
Are there other configs that should take a memory string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10289#issuecomment-166856069
  
**[Test build #2251 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2251/consoleFull)**
 for PR 10289 at commit 
[`cb6a862`](https://github.com/apache/spark/commit/cb6a8626e9c29d8b4a1f4bd2f0163784cf306bb5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166867696
  
**[Test build #48236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48236/consoleFull)**
 for PR 10444 at commit 
[`031278f`](https://github.com/apache/spark/commit/031278f9b8e8177640b3b6728a058eb153b9a2b4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166868044
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48236/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166868043
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

2015-12-23 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/10445

[SPARK-12498][SQL][MINOR] BooleanSimplication simplification

Scala syntax and existing Catalyst expression DSL allows us to refactor the 
`BooleanSimplification` optimization rule to a clearer and more readable way. 
Actually you hardly need any extra comments to understand most of the 
transformations.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark 
boolean-simplification-simplification

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10445.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10445


commit 41abd4e22b69d2c96e95eb5e4bd329b195731f1b
Author: Cheng Lian 
Date:   2015-12-23T11:03:28Z

BooleanSimplication simplification




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166869105
  
**[Test build #48232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48232/consoleFull)**
 for PR 10391 at commit 
[`7389e15`](https://github.com/apache/spark/commit/7389e157c1e1411a4115df29969359888f13c502).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166869266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48232/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166869262
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10445#issuecomment-166870629
  
**[Test build #48237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48237/consoleFull)**
 for PR 10445 at commit 
[`41abd4e`](https://github.com/apache/spark/commit/41abd4e22b69d2c96e95eb5e4bd329b195731f1b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166870843
  
**[Test build #48238 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48238/consoleFull)**
 for PR 10444 at commit 
[`6e2c052`](https://github.com/apache/spark/commit/6e2c0524723b6689e17550dad2b515a6e7602cb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10289#issuecomment-166878686
  
**[Test build #2251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2251/consoleFull)**
 for PR 10289 at commit 
[`cb6a862`](https://github.com/apache/spark/commit/cb6a8626e9c29d8b4a1f4bd2f0163784cf306bb5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class SecurityManagerSuite extends SparkFunSuite with ResetSystemProperties `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166879266
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...

2015-12-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10443#discussion_r48346325
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala
 ---
@@ -241,7 +241,7 @@ case class NewInstance(
 $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
 if ($argsNonNull) {
   ${ev.value} = $constructorCall;
-  ${ev.isNull} = false;
+  ${ev.isNull} = ${ev.value} == null;
--- End diff --

Since it is null or not will be depended on ev.value no matter the value of 
`propagateNull`, should we set the `nullable` of `NewInstance` to true. 
Currently it is assigned from `propagateNull`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166924872
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166924875
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48244/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12499] [build] don't force MAVEN_OPTS

2015-12-23 Thread abridgett
GitHub user abridgett opened a pull request:

https://github.com/apache/spark/pull/10448

[SPARK-12499] [build] don't force MAVEN_OPTS

allow the user to override MAVEN_OPTS (2GB wasn't sufficient for me)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/abridgett/spark 
feature/do_not_force_maven_opts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10448.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10448


commit 2f90b4db1fe52b7d80250137105bd6127e94bda0
Author: Adrian Bridgett 
Date:   2015-12-23T15:44:25Z

don't force MAVEN_OPTS




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...

2015-12-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10443#discussion_r48344118
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala
 ---
@@ -231,7 +231,7 @@ case class NewInstance(
   s"new $className($argString)"
 }
 
-if (propagateNull) {
+if (propagateNull && argGen.nonEmpty) {
--- End diff --

a small fix: we should not go into this branch if `argGen` is empty, or the 
`argsNonNull` will be `!()`, which is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10443#issuecomment-166916883
  
**[Test build #48242 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48242/consoleFull)**
 for PR 10443 at commit 
[`ac3a88b`](https://github.com/apache/spark/commit/ac3a88b065e0639684e6e088fce90a3425b4ced1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12340][SQL] Fix overstep the bounds of ...

2015-12-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10310#issuecomment-166918642
  
@QiangCai thanks that looks good, but this needs a rebase now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10446#issuecomment-166923370
  
**[Test build #48243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48243/consoleFull)**
 for PR 10446 at commit 
[`9864b68`](https://github.com/apache/spark/commit/9864b682eb5e432270bd1bbcc502a932a9a6b70b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12500] [CORE] Fix Tachyon deprecations;...

2015-12-23 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/10449

[SPARK-12500] [CORE] Fix Tachyon deprecations; pull Tachyon dependency into 
one class

Fix Tachyon deprecations; pull Tachyon dependency into 
`TachyonBlockManager` only

CC @calvinjia as I probably need a double-check that the usage of the new 
API is correct.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-12500

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10449.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10449


commit 67605c0e8c06bee263c764d309b964f8915d11b8
Author: Sean Owen 
Date:   2015-12-23T15:59:59Z

Fix Tachyon deprecations; pull Tachyon dependency into TachyonBlockManager 
only




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166899721
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48239/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166899666
  
**[Test build #48239 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48239/consoleFull)**
 for PR 10391 at commit 
[`7389e15`](https://github.com/apache/spark/commit/7389e157c1e1411a4115df29969359888f13c502).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166899720
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][SQL] add Hash expression that ca...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10435#issuecomment-166901054
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48241/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][SQL] add Hash expression that ca...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10435#issuecomment-166901014
  
**[Test build #48241 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48241/consoleFull)**
 for PR 10435 at commit 
[`79a5738`](https://github.com/apache/spark/commit/79a5738396e3ad0990f6599249135963f458d357).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class JavaWordBlacklist `\n  * `class JavaDroppedWordsCounter `\n  * `case 
class Hash(children: Seq[Expression]) extends Expression `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10445#issuecomment-166891869
  
**[Test build #48237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48237/consoleFull)**
 for PR 10445 at commit 
[`41abd4e`](https://github.com/apache/spark/commit/41abd4e22b69d2c96e95eb5e4bd329b195731f1b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166891798
  
**[Test build #48238 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48238/consoleFull)**
 for PR 10444 at commit 
[`6e2c052`](https://github.com/apache/spark/commit/6e2c0524723b6689e17550dad2b515a6e7602cb4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10443#issuecomment-166896497
  
**[Test build #48242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48242/consoleFull)**
 for PR 10443 at commit 
[`ac3a88b`](https://github.com/apache/spark/commit/ac3a88b065e0639684e6e088fce90a3425b4ced1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12340][SQL] Fix overstep the bounds of ...

2015-12-23 Thread QiangCai
Github user QiangCai commented on the pull request:

https://github.com/apache/spark/pull/10310#issuecomment-166912120
  
@srowen I have modified all the code, and try to keep them to be the same.

At first, the var numPartsToTry and partsScanned  have been set to Long.
Secondly,  because  var p should be Seq[Int], so i have added to.Int  in 
the until statement  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166925236
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12499] [build] don't force MAVEN_OPTS

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10448#issuecomment-166931510
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10445#issuecomment-166891943
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166891882
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10445#issuecomment-166891946
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48237/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10444#issuecomment-166891883
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48238/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...

2015-12-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10443#discussion_r48346034
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala
 ---
@@ -231,7 +231,7 @@ case class NewInstance(
   s"new $className($argString)"
 }
 
-if (propagateNull) {
+if (propagateNull && argGen.nonEmpty) {
--- End diff --

Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...

2015-12-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10443#discussion_r48350913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala
 ---
@@ -241,7 +241,7 @@ case class NewInstance(
 $javaType ${ev.value} = ${ctx.defaultValue(dataType)};
 if ($argsNonNull) {
   ${ev.value} = $constructorCall;
-  ${ev.isNull} = false;
+  ${ev.isNull} = ${ev.value} == null;
--- End diff --

Actually it's a irrelevant change and I forgot to revert it before push 
commits.
The reason I made this change is: if we think a constructor may return 
null, why we skip the null check in `propagateNull = true` branch? If @marmbrus 
think this change makes sense, we should also set the `nullalble` to true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12331][ML] R^2 for regression through t...

2015-12-23 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/10384#discussion_r48355484
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala 
---
@@ -23,15 +23,23 @@ import org.apache.spark.Logging
 import org.apache.spark.mllib.linalg.Vectors
 import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, 
MultivariateOnlineSummarizer}
 import org.apache.spark.sql.DataFrame
-
 /**
  * Evaluator for regression.
  *
- * @param predictionAndObservations an RDD of (prediction, observation) 
pairs.
+ * @param predictionAndObservations an RDD of (prediction, observation) 
pairs,
+ * @param regThroughOrigin true if intercept is not included in linear 
regression model
--- End diff --

@dbtsai It may be nitpicking, but `RegressionMetrics` class can be used for 
any regression model and so `hasFitIntercept` doesn't make sense for all types 
of regressors (as mentioned in other comments). Someone evaluating a decision 
tree regression model might be confused by the parameter. I am not certain what 
is best, so I will defer to others' opinions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12500] [CORE] Fix Tachyon deprecations;...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10449#issuecomment-166934193
  
**[Test build #48247 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48247/consoleFull)**
 for PR 10449 at commit 
[`67605c0`](https://github.com/apache/spark/commit/67605c0e8c06bee263c764d309b964f8915d11b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12480][SQL] add Hash expression that ca...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10435#issuecomment-166901052
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10443#issuecomment-166917017
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48242/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10443#issuecomment-166917016
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166926556
  
**[Test build #48245 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48245/consoleFull)**
 for PR 10391 at commit 
[`f5b9d50`](https://github.com/apache/spark/commit/f5b9d507df3946a18426061d3dc0d85e17cca80a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166928166
  
**[Test build #48246 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48246/consoleFull)**
 for PR 10391 at commit 
[`f5b9d50`](https://github.com/apache/spark/commit/f5b9d507df3946a18426061d3dc0d85e17cca80a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166948106
  
**[Test build #48246 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48246/consoleFull)**
 for PR 10391 at commit 
[`f5b9d50`](https://github.com/apache/spark/commit/f5b9d507df3946a18426061d3dc0d85e17cca80a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48363721
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala ---
@@ -0,0 +1,658 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.NoSuchElementException
+
+import javax.servlet.http.{HttpServletRequest, HttpServletResponse}
+import javax.servlet.{DispatcherType, Filter, FilterChain, FilterConfig, 
ServletException, ServletRequest, ServletResponse}
+
+import scala.collection.JavaConverters._
+
+import com.codahale.metrics.{Counter, MetricRegistry, Timer}
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache, 
RemovalListener, RemovalNotification}
+import org.eclipse.jetty.servlet.FilterHolder
+
+import org.apache.spark.Logging
+import org.apache.spark.metrics.source.Source
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.Clock
+
+/**
+ * Cache for applications.
+ *
+ * Completed applications are cached for as long as there is capacity for 
them.
+ * Incompleted applications have their update time checked on every
+ * retrieval; if the cached entry is out of date, it is refreshed.
+ *
+ * @note there must be only one instance of [[ApplicationCache]] in a
+ * JVM at a time. This is because a static field in 
[[ApplicationCacheCheckFilterRelay]]
+ * keeps a reference to the cache so that HTTP requests on the 
attempt-specific web UIs
+ * can probe the current cache to see if the attempts have changed.
+ *
+ * Creating multiple instances will break this routing.
+ * @param operations implementation of record access operations
+ * @param refreshInterval interval between refreshes in milliseconds.
+ * @param retainedApplications number of retained applications
+ * @param clock time source
+ */
+private[history] class ApplicationCache(
+val operations: ApplicationCacheOperations,
+val refreshInterval: Long,
+val retainedApplications: Int,
+val clock: Clock) extends Logging {
+
+  /**
+   * Services the load request from the cache.
+   */
+  private val appLoader = new CacheLoader[CacheKey, CacheEntry] {
+
+/** the cache key doesn't match an cached entry ... attempt to load it 
 */
+override def load(key: CacheKey): CacheEntry = {
+  loadApplicationEntry(key.appId, key.attemptId)
+}
+
+  }
+
+  /**
+   * Handler for callbacks from the cache of entry removal.
+   */
+  private val removalListener = new RemovalListener[CacheKey, CacheEntry] {
+
+/**
+ * Removal event notifies the provider to detach the UI.
+ * @param rm removal notification
+ */
+override def onRemoval(rm: RemovalNotification[CacheKey, CacheEntry]): 
Unit = {
+  metrics.evictionCount.inc()
+  val key = rm.getKey
+  logDebug(s"Evicting entry ${key}")
+  operations.detachSparkUI(key.appId, key.attemptId, rm.getValue().ui)
+}
+  }
+
+  /**
+   * The cache of applications.
+   *
+   * Tagged as `protected` so as to allow subclasses in tests to accesss 
it directly
+   */
+  protected val appCache: LoadingCache[CacheKey, CacheEntry] = {
+CacheBuilder.newBuilder()
+.maximumSize(retainedApplications)
+.removalListener(removalListener)
+.build(appLoader)
+  }
+
+  /**
+   * The metrics which are updated as the cache is used
+   */
+  val metrics = new CacheMetrics("history.cache")
+
+  init()
+
+  /**
+   * Perform any startup operations.
+   *
+   * This includes declaring this instance as the cache to use in the
+   * [[ApplicationCacheCheckFilterRelay]].
+   */
+  private def init(): Unit = {
+ApplicationCacheCheckFilterRelay.setApplicationCache(this)
+  }
+
+  /**

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48364543
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala 
---
@@ -0,0 +1,476 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.{Date, NoSuchElementException}
+
+import javax.servlet.Filter
+
+import scala.collection.mutable
+import scala.collection.mutable.ListBuffer
+import scala.language.postfixOps
+
+import com.codahale.metrics.Counter
+import com.google.common.cache.LoadingCache
+import com.google.common.util.concurrent.UncheckedExecutionException
+import org.eclipse.jetty.servlet.ServletContextHandler
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark.status.api.v1.{ApplicationAttemptInfo => 
AttemptInfo, ApplicationInfo}
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.{Clock, ManualClock, Utils}
+import org.apache.spark.{Logging, SparkFunSuite}
+
+class ApplicationCacheSuite extends SparkFunSuite with Logging with 
MockitoSugar with Matchers {
+
+  /**
+   * subclass with access to the cache internals
+   * @param refreshInterval interval between refreshes in milliseconds.
+   * @param retainedApplications number of retained applications
+   */
+  class TestApplicationCache(
+  operations: ApplicationCacheOperations = new StubCacheOperations(),
+  refreshInterval: Long,
+  retainedApplications: Int,
+  clock: Clock = new ManualClock(0))
+  extends ApplicationCache(operations, refreshInterval, 
retainedApplications, clock) {
+
+def cache(): LoadingCache[CacheKey, CacheEntry] = appCache
+  }
+
+  /**
+   * Stub cache operations.
+   * The state is kept in a map of [[CacheKey]] to [[CacheEntry]],
+   * the `probeTime` field in the cache entry setting the timestamp of the 
entry
+   */
+  class StubCacheOperations extends ApplicationCacheOperations with 
Logging {
+
+/** map to UI instances, including timestamps, which are used in 
update probes */
+val instances = mutable.HashMap.empty[CacheKey, CacheEntry]
+
+/** Map of attached spark UIs */
+val attached = mutable.HashMap.empty[CacheKey, SparkUI]
+
+var getAppUICount = 0L
+var attachCount = 0L
+var detachCount = 0L
+var updateProbeCount = 0L
+
+/**
+ * Get the application UI
+ * @param appId application ID
+ * @param attemptId attempt ID
+ * @return If found, the Spark UI and any history information to be 
used in the cache
+ */
+override def getAppUI(appId: String, attemptId: Option[String]): 
Option[LoadedAppUI] = {
+  logDebug(s"getAppUI($appId, $attemptId)")
+  getAppUICount += 1
+  instances.get(CacheKey(appId, attemptId)).map( e =>
+LoadedAppUI(e.ui, Some(new 
StubHistoryProviderUpdateState(e.probeTime
+}
+
+override def attachSparkUI(appId: String, attemptId: Option[String], 
ui: SparkUI,
+completed: Boolean): Unit = {
+  logDebug(s"attachSparkUI($appId, $attemptId, $ui)")
+  attachCount += 1
+  attached += (CacheKey(appId, attemptId) -> ui)
+}
+
+def putAndAttach(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = putAppUI(appId, attemptId, completed, started, ended, 
timestamp)
+  attachSparkUI(appId, attemptId, ui, completed)
+  ui
+}
+
+def putAppUI(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = newUI(appId, attemptId, completed, started, ended)
+  putInstance(appId, attemptId, ui, 

[GitHub] spark pull request: [SPARK-12500] [CORE] Fix Tachyon deprecations;...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10449#issuecomment-166957539
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48247/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48367142
  
--- Diff: docs/monitoring.md ---
@@ -69,36 +83,53 @@ follows:
   
 
 
+### Spark configuration options
+
 
   Property NameDefaultMeaning
   
 spark.history.provider
-org.apache.spark.deploy.history.FsHistoryProvider
+org.apache.spark.deploy.history.FsHistoryProvider
 Name of the class implementing the application history backend. 
Currently there is only
 one implementation, provided by Spark, which looks for application 
logs stored in the
 file system.
   
   
+spark.history.retainedApplications
+50
--- End diff --

I'm not entirely sure why that ended up there. You could make the case for 
separating fs options from other ones, but really the most important is that 
log dir value, as without it: nothing else works


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48368082
  
--- Diff: docs/monitoring.md ---
@@ -69,36 +83,53 @@ follows:
   
 
 
+### Spark configuration options
+
 
   Property NameDefaultMeaning
   
 spark.history.provider
-org.apache.spark.deploy.history.FsHistoryProvider
+org.apache.spark.deploy.history.FsHistoryProvider
 Name of the class implementing the application history backend. 
Currently there is only
 one implementation, provided by Spark, which looks for application 
logs stored in the
 file system.
   
   
+spark.history.retainedApplications
+50
+
+  The number of application UIs to retain. If this cap is exceeded, 
then the oldest
+  applications will be removed.
+
+  
+  
 spark.history.fs.logDirectory
 file:/tmp/spark-events
 
- Directory that contains application event logs to be loaded by the 
history server
+For the filesystem history provider, the URL to the directory 
containing application event
+logs to load. This can be a local file:// path,
+an HDFS path hdfs://namenode/shared/spark-logs
+or that of an alternative filesystem supported by the Hadoop APIs.
 
   
   
 spark.history.fs.update.interval
 10s
 
-  The period at which information displayed by this history server is 
updated.
-  Each update checks for any changes made to the event logs in 
persisted storage.
+  The period at which the the filesystem history provider checks for 
new or
+  updated logs in the log directory. A shorter interval detects new 
applications faster,
+  at the expense of more server load re-reading updated applications.
+  As soon as an update has completed, listings of the completed and 
incomplete applications
+  will reflect the changes. For performance reasons, the UIs of web 
applications are
+  only updated at a slower interval, that defined in 
spark.history.cache.window 
--- End diff --

There's three costs in the system: listing cost, probe cost and replay 
costs.

* listing cost is pretty expensive in the history server, as it replays the 
entire history just to get a few flags which could be  cached alongside 
(completed flag, etc). That's why it can be slow to startup. After startup the 
async replay is only done on changed data. Load on HDFS: negligible.
* probe cost: simply checking the internal state of things updated in the 
update thread., ~0
* replay cost: expensive, O(events), so essentially O(filesize). Again, 
HDFS doesn't notice.

The rationale for having a probe interval is not so much code cost, but 
replay costs: have a 15s probe interval would mean "a user clicking through the 
UI of a busy app could trigger a reload every 15s". I don't have the stats to 
decide good or bad that is, but a longer interval worries me less.

FWIW, the Yarn timeline provider costs are
 -listing cost is  less expensive than for the FS history provider, but it 
does move some of the load into the timeline server (search of database, 
serialization of result).
 -probe cost. ~0 again
 -replay cost., same replay costs as for the FS, but now with json 
serialization and transmission over HTTP to add.

I suspect there you'd want a longer interval for probes, just to keep those 
replays down.

Again: more data is needed here. I've added the metrics to the cache as a 
start to that —add metrics publishing to the history server and this code is 
ready to be hooked up and so show the numbers on cache reload operations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL] Incorrect resu...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10450#issuecomment-166950936
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48364727
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala 
---
@@ -0,0 +1,476 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.{Date, NoSuchElementException}
+
+import javax.servlet.Filter
+
+import scala.collection.mutable
+import scala.collection.mutable.ListBuffer
+import scala.language.postfixOps
+
+import com.codahale.metrics.Counter
+import com.google.common.cache.LoadingCache
+import com.google.common.util.concurrent.UncheckedExecutionException
+import org.eclipse.jetty.servlet.ServletContextHandler
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark.status.api.v1.{ApplicationAttemptInfo => 
AttemptInfo, ApplicationInfo}
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.{Clock, ManualClock, Utils}
+import org.apache.spark.{Logging, SparkFunSuite}
+
+class ApplicationCacheSuite extends SparkFunSuite with Logging with 
MockitoSugar with Matchers {
+
+  /**
+   * subclass with access to the cache internals
+   * @param refreshInterval interval between refreshes in milliseconds.
+   * @param retainedApplications number of retained applications
+   */
+  class TestApplicationCache(
+  operations: ApplicationCacheOperations = new StubCacheOperations(),
+  refreshInterval: Long,
+  retainedApplications: Int,
+  clock: Clock = new ManualClock(0))
+  extends ApplicationCache(operations, refreshInterval, 
retainedApplications, clock) {
+
+def cache(): LoadingCache[CacheKey, CacheEntry] = appCache
+  }
+
+  /**
+   * Stub cache operations.
+   * The state is kept in a map of [[CacheKey]] to [[CacheEntry]],
+   * the `probeTime` field in the cache entry setting the timestamp of the 
entry
+   */
+  class StubCacheOperations extends ApplicationCacheOperations with 
Logging {
+
+/** map to UI instances, including timestamps, which are used in 
update probes */
+val instances = mutable.HashMap.empty[CacheKey, CacheEntry]
+
+/** Map of attached spark UIs */
+val attached = mutable.HashMap.empty[CacheKey, SparkUI]
+
+var getAppUICount = 0L
+var attachCount = 0L
+var detachCount = 0L
+var updateProbeCount = 0L
+
+/**
+ * Get the application UI
+ * @param appId application ID
+ * @param attemptId attempt ID
+ * @return If found, the Spark UI and any history information to be 
used in the cache
+ */
+override def getAppUI(appId: String, attemptId: Option[String]): 
Option[LoadedAppUI] = {
+  logDebug(s"getAppUI($appId, $attemptId)")
+  getAppUICount += 1
+  instances.get(CacheKey(appId, attemptId)).map( e =>
+LoadedAppUI(e.ui, Some(new 
StubHistoryProviderUpdateState(e.probeTime
+}
+
+override def attachSparkUI(appId: String, attemptId: Option[String], 
ui: SparkUI,
+completed: Boolean): Unit = {
+  logDebug(s"attachSparkUI($appId, $attemptId, $ui)")
+  attachCount += 1
+  attached += (CacheKey(appId, attemptId) -> ui)
+}
+
+def putAndAttach(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = putAppUI(appId, attemptId, completed, started, ended, 
timestamp)
+  attachSparkUI(appId, attemptId, ui, completed)
+  ui
+}
+
+def putAppUI(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = newUI(appId, attemptId, completed, started, ended)
+  putInstance(appId, attemptId, ui, 

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48364720
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala 
---
@@ -0,0 +1,476 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.{Date, NoSuchElementException}
+
+import javax.servlet.Filter
+
+import scala.collection.mutable
+import scala.collection.mutable.ListBuffer
+import scala.language.postfixOps
+
+import com.codahale.metrics.Counter
+import com.google.common.cache.LoadingCache
+import com.google.common.util.concurrent.UncheckedExecutionException
+import org.eclipse.jetty.servlet.ServletContextHandler
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark.status.api.v1.{ApplicationAttemptInfo => 
AttemptInfo, ApplicationInfo}
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.{Clock, ManualClock, Utils}
+import org.apache.spark.{Logging, SparkFunSuite}
+
+class ApplicationCacheSuite extends SparkFunSuite with Logging with 
MockitoSugar with Matchers {
+
+  /**
+   * subclass with access to the cache internals
+   * @param refreshInterval interval between refreshes in milliseconds.
+   * @param retainedApplications number of retained applications
+   */
+  class TestApplicationCache(
+  operations: ApplicationCacheOperations = new StubCacheOperations(),
+  refreshInterval: Long,
+  retainedApplications: Int,
+  clock: Clock = new ManualClock(0))
+  extends ApplicationCache(operations, refreshInterval, 
retainedApplications, clock) {
+
+def cache(): LoadingCache[CacheKey, CacheEntry] = appCache
+  }
+
+  /**
+   * Stub cache operations.
+   * The state is kept in a map of [[CacheKey]] to [[CacheEntry]],
+   * the `probeTime` field in the cache entry setting the timestamp of the 
entry
+   */
+  class StubCacheOperations extends ApplicationCacheOperations with 
Logging {
+
+/** map to UI instances, including timestamps, which are used in 
update probes */
+val instances = mutable.HashMap.empty[CacheKey, CacheEntry]
+
+/** Map of attached spark UIs */
+val attached = mutable.HashMap.empty[CacheKey, SparkUI]
+
+var getAppUICount = 0L
+var attachCount = 0L
+var detachCount = 0L
+var updateProbeCount = 0L
+
+/**
+ * Get the application UI
+ * @param appId application ID
+ * @param attemptId attempt ID
+ * @return If found, the Spark UI and any history information to be 
used in the cache
+ */
+override def getAppUI(appId: String, attemptId: Option[String]): 
Option[LoadedAppUI] = {
+  logDebug(s"getAppUI($appId, $attemptId)")
+  getAppUICount += 1
+  instances.get(CacheKey(appId, attemptId)).map( e =>
+LoadedAppUI(e.ui, Some(new 
StubHistoryProviderUpdateState(e.probeTime
+}
+
+override def attachSparkUI(appId: String, attemptId: Option[String], 
ui: SparkUI,
+completed: Boolean): Unit = {
+  logDebug(s"attachSparkUI($appId, $attemptId, $ui)")
+  attachCount += 1
+  attached += (CacheKey(appId, attemptId) -> ui)
+}
+
+def putAndAttach(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = putAppUI(appId, attemptId, completed, started, ended, 
timestamp)
+  attachSparkUI(appId, attemptId, ui, completed)
+  ui
+}
+
+def putAppUI(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = newUI(appId, attemptId, completed, started, ended)
+  putInstance(appId, attemptId, ui, 

[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10446#issuecomment-166960152
  
**[Test build #2253 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2253/consoleFull)**
 for PR 10446 at commit 
[`9864b68`](https://github.com/apache/spark/commit/9864b682eb5e432270bd1bbcc502a932a9a6b70b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-166963932
  
**[Test build #48248 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48248/consoleFull)**
 for PR 10451 at commit 
[`56fd782`](https://github.com/apache/spark/commit/56fd78267618a3b3aab73ca01d682f4fb25a638c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12486] Worker should kill the executors...

2015-12-23 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10438#issuecomment-166969653
  
Looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12396][Core] Modify the function schedu...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10447#issuecomment-166969718
  
**[Test build #48249 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48249/consoleFull)**
 for PR 10447 at commit 
[`34cab3a`](https://github.com/apache/spark/commit/34cab3a7afafacc96656d7e0ff735cb642d8b683).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12415] Do not use closure serializer to...

2015-12-23 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10368#discussion_r48372722
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -154,6 +155,19 @@ class KryoSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
   mutable.HashMap(1 -> "one", 2 -> "two", 3 -> "three")))
   }
 
+  test("Bug: SPARK-12415") {
+val ser = new 
KryoSerializer(conf.clone.set("spark.kryo.registrationRequired", "true"))
+  .newInstance()
+def check[T: ClassTag](t: T) {
+  val ret = ser.deserialize[T](ser.serialize(t))
+  assert(ret.equals(t), "Deser " + ret + " orig " + t)
--- End diff --

Nit: Use `==` insead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10446#issuecomment-166947653
  
**[Test build #48243 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48243/consoleFull)**
 for PR 10446 at commit 
[`9864b68`](https://github.com/apache/spark/commit/9864b682eb5e432270bd1bbcc502a932a9a6b70b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10446#issuecomment-166947721
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12481] [CORE] [STREAMING] [SQL] Remove ...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10446#issuecomment-166947723
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48243/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48364439
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -430,8 +517,55 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 }
 newIterator.foreach(addIfAbsent)
 oldIterator.foreach(addIfAbsent)
+mergedApps
+  }
+
+
+  /**
+   * Scan through all known incomplete applications, and check for any 
that have been updated.
+   *
+   * After the scan, if there were any updated attempts, [[applications]] 
is updated
+   * with the new values.
+   *
+   * 1. No attempt to replay the application is made; this scan is a low 
cost operation.
+   * 2. As this overwrites [[applications]] with a new value, it must not 
run concurrently
+   * with the main scan for new applications. That is: it must be in the 
[[checkForLogs()]]
+   * operation.
+   */
+  private[history] def scanAndUpdateIncompleteAttemptInfo(): Unit = {
+val newAttempts: Iterable[FsApplicationAttemptInfo] = applications
+.filter( e => !e._2.completed)
+.flatMap { e =>
+  // build list of (false, attempt) or (true, attempt') values
+  e._2.attempts.flatMap { prevInfo: FsApplicationAttemptInfo =>
+val path = new Path(logDir, prevInfo.logPath)
+try {
+  val status = fs.getFileStatus(path)
+  val size = getLogSize(status).getOrElse(-1L)
+  val aS = prevInfo.fileSize
+  if (size > aS) {
+logDebug(s"Attempt ${prevInfo.name}/${prevInfo.appId} size 
=> $size")
+Some(new FsApplicationAttemptInfo(prevInfo.logPath, 
prevInfo.name, prevInfo.appId,
+  prevInfo.attemptId, prevInfo.startTime, 
prevInfo.endTime, prevInfo.lastUpdated,
+  prevInfo.sparkUser, prevInfo.completed, size, 
attemptCounter.incrementAndGet()))
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48364502
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala 
---
@@ -0,0 +1,476 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.{Date, NoSuchElementException}
+
+import javax.servlet.Filter
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48366527
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala ---
@@ -21,15 +21,28 @@ import java.net.{HttpURLConnection, URL}
 import java.util.zip.ZipInputStream
 import javax.servlet.http.{HttpServletRequest, HttpServletResponse}
 
+import scala.concurrent.duration._
+import scala.language.postfixOps
+
+import com.codahale.metrics.Counter
 import com.google.common.base.Charsets
 import com.google.common.io.{ByteStreams, Files}
 import org.apache.commons.io.{FileUtils, IOUtils}
+import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
+import org.json4s.JsonAST.JArray
+import org.json4s.jackson.JsonMethods._
 import org.mockito.Mockito.when
+import org.openqa.selenium.htmlunit.HtmlUnitDriver
+import org.openqa.selenium.WebDriver
+import org.scalatest.concurrent.Eventually
+import org.scalatest.selenium.WebBrowser
 import org.scalatest.{BeforeAndAfter, Matchers}
 import org.scalatest.mock.MockitoSugar
 
-import org.apache.spark.{JsonTestUtils, SecurityManager, SparkConf, 
SparkFunSuite}
+import org.apache.spark.ui.jobs.UIData.JobUIData
 import org.apache.spark.ui.{SparkUI, UIUtils}
+import org.apache.spark._
--- End diff --

I'll try and use that plugin very carefully and see what happens...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48366891
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala ---
@@ -281,6 +296,191 @@ class HistoryServerSuite extends SparkFunSuite with 
BeforeAndAfter with Matchers
 all (siteRelativeLinks) should startWith (uiRoot)
   }
 
+  test("incomplete apps get refreshed") {
+
+implicit val webDriver: WebDriver = new HtmlUnitDriver
+implicit val formats = org.json4s.DefaultFormats
+
+// this test dir is explictly deleted on successful runs; retained for 
diagnostics when
+// not
+val logDir = 
Utils.createDirectory(System.getProperty("java.io.tmpdir", "logs"))
+
+// a new conf is used with the background thread set and running at 
its fastest
+// alllowed refresh rate (1Hz)
+val myConf = new SparkConf()
+  .set("spark.history.fs.logDirectory", logDir.getAbsolutePath)
+  .set("spark.eventLog.dir", logDir.getAbsolutePath)
+  .set("spark.history.fs.update.interval", "1s")
+  .set("spark.eventLog.enabled", "true")
+  .set("spark.history.cache.window", "250ms")
+  .remove("spark.testing")
+val provider = new FsHistoryProvider(myConf)
+val securityManager = new SecurityManager(myConf)
+
+val sc = new SparkContext("local", "test", myConf)
+val logDirUri = logDir.toURI
+val logDirPath = new Path(logDirUri)
+val fs = FileSystem.get(logDirUri, sc.hadoopConfiguration)
+
+def listDir(dir: Path): Seq[FileStatus] = {
+  val statuses = fs.listStatus(dir)
+  statuses.flatMap(
+stat => if (stat.isDirectory) listDir(stat.getPath) else Seq(stat))
+}
+
+def dumpLogDir(msg: String = ""): Unit = {
+  if (log.isDebugEnabled) {
+logDebug(msg)
+listDir(logDirPath).foreach { status =>
+  val s = status.toString
+  logDebug(s)
+}
+  }
+}
+
+// stop the server with the old config, and start the new one
+server.stop()
+server = new HistoryServer(myConf, provider, securityManager, 18080)
+server.initialize()
+server.bind()
+try {
+  val port = server.boundPort
+  val metrics = server.cacheMetrics
+
+  // assert that a metric has a value; if not dump the whole metrics 
instance
+  def assertMetric(name: String, counter: Counter, expected: Long): 
Unit = {
+val actual = counter.getCount
+if (actual != expected) {
+  // this is here because Scalatest loses stack depth
+  fail(s"Wrong $name value - expected $expected but got $actual" +
+  s" in metrics\n$metrics")
+}
+  }
+
+  // build a URL for an app or app/attempt plus a page underneath
+  def buildURL(appId: String, suffix: String): URL = {
+new URL(s"http://localhost:$port/history/$appId$suffix;)
+  }
+  val historyServerRoot = new URL(s"http://localhost:$port/;)
+  val historyServerIncompleted = new URL(historyServerRoot, 
"?page=1=true")
+
+  // assert the body of a URL contains a string; return that body
+  def assertUrlContains(url: URL, str: String): String = {
+val body = HistoryServerSuite.getUrl(url)
+assert(body.contains(str), s"did not find $str at $url : $body")
+body
+  }
+
+  // start initial job
+  val d = sc.parallelize(1 to 10)
+  d.count()
+  val stdInterval = interval(100 milliseconds)
+  val appId = eventually(timeout(20 seconds), stdInterval) {
+val json = getContentAndCode("applications", port)._2.get
+val apps = parse(json).asInstanceOf[JArray].arr
+apps should have size 1
+(apps.head \ "id").extract[String]
+  }
+
+  // which lists as incomplete
+  assertUrlContains(historyServerIncompleted, appId)
+
+  val appIdRoot = buildURL(appId, "")
+  val rootAppPage = HistoryServerSuite.getUrl(appIdRoot)
+  logDebug(s"$appIdRoot ->[${rootAppPage.length}] \n$rootAppPage")
+  // sanity check to make sure filter is chaining calls
+  rootAppPage should not be empty
+
+  def getAppUI: SparkUI = {
+provider.getAppUI(appId, None).get.ui
+  }
+
+  // selenium isn't that useful on failures...add our own reporting
+  def getNumJobs(suffix: String): Int = {
+val target = buildURL(appId, suffix)
+val targetBody = HistoryServerSuite.getUrl(target)
+try {
+  go to target.toExternalForm
+  findAll(cssSelector("tbody 

[GitHub] spark pull request: [SPARK-12396][Core] Modify the function schedu...

2015-12-23 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10447#issuecomment-166967310
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12396][Core] Modify the function schedu...

2015-12-23 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10447#issuecomment-166970244
  
Changes LGTM. I'll merge this once tests pass


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12415] Do not use closure serializer to...

2015-12-23 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10368#discussion_r48371027
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResult.scala 
---
@@ -46,6 +46,26 @@ class DirectTaskResult[T](var valueBytes: ByteBuffer, 
var accumUpdates: Map[Long
 
   def this() = this(null.asInstanceOf[ByteBuffer], null, null)
 
+  override def toString: String = valueBytes.toString + " " + accumUpdates 
+ " " + metrics
+
+  override def equals(other: Any): Boolean = other match {
+case that: DirectTaskResult[_] => {
+  if (!this.valueBytes.equals(that.valueBytes)) return false
+
+  val accumSize = if (accumUpdates != null) accumUpdates.size else 0
+  val thatAccumSize = if (that.accumUpdates != null) 
that.accumUpdates.size else 0
+  if (accumSize != thatAccumSize) return false
+  if (accumSize > 0) {
+val b = this.accumUpdates.keys.forall { key =>
+  this.accumUpdates.get(key) == that.accumUpdates.get(key)
+}
+if (!b) return false;
+  }
+  this.metrics.equals(that.metrics)
+}
--- End diff --

similarly, this can be simplified
```
val accumEquals =
  if (accumUpdates != null) {
accumUpdates.keys.forall { k => accumUpdates.get(k) == 
that.accumUpdates.get(k) }
  } else {
that.accumUpdates == null
  }
  valueBytes.equals(that.valueBytes) &&
  metrics.equals(that.metrics) &&
  accumEquals
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48363399
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala ---
@@ -0,0 +1,658 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.NoSuchElementException
+
+import javax.servlet.http.{HttpServletRequest, HttpServletResponse}
+import javax.servlet.{DispatcherType, Filter, FilterChain, FilterConfig, 
ServletException, ServletRequest, ServletResponse}
+
+import scala.collection.JavaConverters._
+
+import com.codahale.metrics.{Counter, MetricRegistry, Timer}
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache, 
RemovalListener, RemovalNotification}
+import org.eclipse.jetty.servlet.FilterHolder
+
+import org.apache.spark.Logging
+import org.apache.spark.metrics.source.Source
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.Clock
+
+/**
+ * Cache for applications.
+ *
+ * Completed applications are cached for as long as there is capacity for 
them.
+ * Incompleted applications have their update time checked on every
+ * retrieval; if the cached entry is out of date, it is refreshed.
+ *
+ * @note there must be only one instance of [[ApplicationCache]] in a
+ * JVM at a time. This is because a static field in 
[[ApplicationCacheCheckFilterRelay]]
+ * keeps a reference to the cache so that HTTP requests on the 
attempt-specific web UIs
+ * can probe the current cache to see if the attempts have changed.
+ *
+ * Creating multiple instances will break this routing.
+ * @param operations implementation of record access operations
+ * @param refreshInterval interval between refreshes in milliseconds.
+ * @param retainedApplications number of retained applications
+ * @param clock time source
+ */
+private[history] class ApplicationCache(
+val operations: ApplicationCacheOperations,
+val refreshInterval: Long,
+val retainedApplications: Int,
+val clock: Clock) extends Logging {
+
+  /**
+   * Services the load request from the cache.
+   */
+  private val appLoader = new CacheLoader[CacheKey, CacheEntry] {
+
+/** the cache key doesn't match an cached entry ... attempt to load it 
 */
+override def load(key: CacheKey): CacheEntry = {
+  loadApplicationEntry(key.appId, key.attemptId)
+}
+
+  }
+
+  /**
+   * Handler for callbacks from the cache of entry removal.
+   */
+  private val removalListener = new RemovalListener[CacheKey, CacheEntry] {
+
+/**
+ * Removal event notifies the provider to detach the UI.
+ * @param rm removal notification
+ */
+override def onRemoval(rm: RemovalNotification[CacheKey, CacheEntry]): 
Unit = {
+  metrics.evictionCount.inc()
+  val key = rm.getKey
+  logDebug(s"Evicting entry ${key}")
+  operations.detachSparkUI(key.appId, key.attemptId, rm.getValue().ui)
+}
+  }
+
+  /**
+   * The cache of applications.
+   *
+   * Tagged as `protected` so as to allow subclasses in tests to accesss 
it directly
+   */
+  protected val appCache: LoadingCache[CacheKey, CacheEntry] = {
+CacheBuilder.newBuilder()
+.maximumSize(retainedApplications)
+.removalListener(removalListener)
+.build(appLoader)
+  }
+
+  /**
+   * The metrics which are updated as the cache is used
+   */
+  val metrics = new CacheMetrics("history.cache")
+
+  init()
+
+  /**
+   * Perform any startup operations.
+   *
+   * This includes declaring this instance as the cache to use in the
+   * [[ApplicationCacheCheckFilterRelay]].
+   */
+  private def init(): Unit = {
+ApplicationCacheCheckFilterRelay.setApplicationCache(this)
+  }
+
+  /**

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48364066
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -430,8 +517,55 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 }
 newIterator.foreach(addIfAbsent)
 oldIterator.foreach(addIfAbsent)
+mergedApps
+  }
+
+
+  /**
+   * Scan through all known incomplete applications, and check for any 
that have been updated.
+   *
+   * After the scan, if there were any updated attempts, [[applications]] 
is updated
+   * with the new values.
+   *
+   * 1. No attempt to replay the application is made; this scan is a low 
cost operation.
+   * 2. As this overwrites [[applications]] with a new value, it must not 
run concurrently
+   * with the main scan for new applications. That is: it must be in the 
[[checkForLogs()]]
+   * operation.
+   */
+  private[history] def scanAndUpdateIncompleteAttemptInfo(): Unit = {
+val newAttempts: Iterable[FsApplicationAttemptInfo] = applications
+.filter( e => !e._2.completed)
+.flatMap { e =>
+  // build list of (false, attempt) or (true, attempt') values
--- End diff --

pasted in your text. Also noted that if a file isn't found, that isn't an 
update/state change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

2015-12-23 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10374#issuecomment-166949939
  
I would prefer to change the implementation of toSeq.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48366085
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala 
---
@@ -0,0 +1,476 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.{Date, NoSuchElementException}
+
+import javax.servlet.Filter
+
+import scala.collection.mutable
+import scala.collection.mutable.ListBuffer
+import scala.language.postfixOps
+
+import com.codahale.metrics.Counter
+import com.google.common.cache.LoadingCache
+import com.google.common.util.concurrent.UncheckedExecutionException
+import org.eclipse.jetty.servlet.ServletContextHandler
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark.status.api.v1.{ApplicationAttemptInfo => 
AttemptInfo, ApplicationInfo}
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.{Clock, ManualClock, Utils}
+import org.apache.spark.{Logging, SparkFunSuite}
+
+class ApplicationCacheSuite extends SparkFunSuite with Logging with 
MockitoSugar with Matchers {
+
+  /**
+   * subclass with access to the cache internals
+   * @param refreshInterval interval between refreshes in milliseconds.
+   * @param retainedApplications number of retained applications
+   */
+  class TestApplicationCache(
+  operations: ApplicationCacheOperations = new StubCacheOperations(),
+  refreshInterval: Long,
+  retainedApplications: Int,
+  clock: Clock = new ManualClock(0))
+  extends ApplicationCache(operations, refreshInterval, 
retainedApplications, clock) {
+
+def cache(): LoadingCache[CacheKey, CacheEntry] = appCache
+  }
+
+  /**
+   * Stub cache operations.
+   * The state is kept in a map of [[CacheKey]] to [[CacheEntry]],
+   * the `probeTime` field in the cache entry setting the timestamp of the 
entry
+   */
+  class StubCacheOperations extends ApplicationCacheOperations with 
Logging {
+
+/** map to UI instances, including timestamps, which are used in 
update probes */
+val instances = mutable.HashMap.empty[CacheKey, CacheEntry]
+
+/** Map of attached spark UIs */
+val attached = mutable.HashMap.empty[CacheKey, SparkUI]
+
+var getAppUICount = 0L
+var attachCount = 0L
+var detachCount = 0L
+var updateProbeCount = 0L
+
+/**
+ * Get the application UI
+ * @param appId application ID
+ * @param attemptId attempt ID
+ * @return If found, the Spark UI and any history information to be 
used in the cache
+ */
+override def getAppUI(appId: String, attemptId: Option[String]): 
Option[LoadedAppUI] = {
+  logDebug(s"getAppUI($appId, $attemptId)")
+  getAppUICount += 1
+  instances.get(CacheKey(appId, attemptId)).map( e =>
+LoadedAppUI(e.ui, Some(new 
StubHistoryProviderUpdateState(e.probeTime
+}
+
+override def attachSparkUI(appId: String, attemptId: Option[String], 
ui: SparkUI,
+completed: Boolean): Unit = {
+  logDebug(s"attachSparkUI($appId, $attemptId, $ui)")
+  attachCount += 1
+  attached += (CacheKey(appId, attemptId) -> ui)
+}
+
+def putAndAttach(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = putAppUI(appId, attemptId, completed, started, ended, 
timestamp)
+  attachSparkUI(appId, attemptId, ui, completed)
+  ui
+}
+
+def putAppUI(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = newUI(appId, attemptId, completed, started, ended)
+  putInstance(appId, attemptId, ui, 

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2015-12-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/6935#discussion_r48366097
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala 
---
@@ -0,0 +1,476 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.util.{Date, NoSuchElementException}
+
+import javax.servlet.Filter
+
+import scala.collection.mutable
+import scala.collection.mutable.ListBuffer
+import scala.language.postfixOps
+
+import com.codahale.metrics.Counter
+import com.google.common.cache.LoadingCache
+import com.google.common.util.concurrent.UncheckedExecutionException
+import org.eclipse.jetty.servlet.ServletContextHandler
+import org.mockito.Mockito._
+import org.scalatest.Matchers
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark.status.api.v1.{ApplicationAttemptInfo => 
AttemptInfo, ApplicationInfo}
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.{Clock, ManualClock, Utils}
+import org.apache.spark.{Logging, SparkFunSuite}
+
+class ApplicationCacheSuite extends SparkFunSuite with Logging with 
MockitoSugar with Matchers {
+
+  /**
+   * subclass with access to the cache internals
+   * @param refreshInterval interval between refreshes in milliseconds.
+   * @param retainedApplications number of retained applications
+   */
+  class TestApplicationCache(
+  operations: ApplicationCacheOperations = new StubCacheOperations(),
+  refreshInterval: Long,
+  retainedApplications: Int,
+  clock: Clock = new ManualClock(0))
+  extends ApplicationCache(operations, refreshInterval, 
retainedApplications, clock) {
+
+def cache(): LoadingCache[CacheKey, CacheEntry] = appCache
+  }
+
+  /**
+   * Stub cache operations.
+   * The state is kept in a map of [[CacheKey]] to [[CacheEntry]],
+   * the `probeTime` field in the cache entry setting the timestamp of the 
entry
+   */
+  class StubCacheOperations extends ApplicationCacheOperations with 
Logging {
+
+/** map to UI instances, including timestamps, which are used in 
update probes */
+val instances = mutable.HashMap.empty[CacheKey, CacheEntry]
+
+/** Map of attached spark UIs */
+val attached = mutable.HashMap.empty[CacheKey, SparkUI]
+
+var getAppUICount = 0L
+var attachCount = 0L
+var detachCount = 0L
+var updateProbeCount = 0L
+
+/**
+ * Get the application UI
+ * @param appId application ID
+ * @param attemptId attempt ID
+ * @return If found, the Spark UI and any history information to be 
used in the cache
+ */
+override def getAppUI(appId: String, attemptId: Option[String]): 
Option[LoadedAppUI] = {
+  logDebug(s"getAppUI($appId, $attemptId)")
+  getAppUICount += 1
+  instances.get(CacheKey(appId, attemptId)).map( e =>
+LoadedAppUI(e.ui, Some(new 
StubHistoryProviderUpdateState(e.probeTime
+}
+
+override def attachSparkUI(appId: String, attemptId: Option[String], 
ui: SparkUI,
+completed: Boolean): Unit = {
+  logDebug(s"attachSparkUI($appId, $attemptId, $ui)")
+  attachCount += 1
+  attached += (CacheKey(appId, attemptId) -> ui)
+}
+
+def putAndAttach(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = putAppUI(appId, attemptId, completed, started, ended, 
timestamp)
+  attachSparkUI(appId, attemptId, ui, completed)
+  ui
+}
+
+def putAppUI(appId: String, attemptId: Option[String], completed: 
Boolean, started: Long,
+ended: Long, timestamp: Long): SparkUI = {
+  val ui = newUI(appId, attemptId, completed, started, ended)
+  putInstance(appId, attemptId, ui, 

[GitHub] spark pull request: [SPARK-12499] [build] don't force MAVEN_OPTS

2015-12-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10448#issuecomment-166958625
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12415] Do not use closure serializer to...

2015-12-23 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10368#discussion_r48370672
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -215,6 +215,19 @@ class TaskMetrics extends Serializable {
 inputMetrics.foreach(_.updateBytesRead())
   }
 
+  override def equals(other: Any): Boolean = other match {
+case that: TaskMetrics => {
+  if (this.executorDeserializeTime != that.executorDeserializeTime) 
return false
+  if (this.executorRunTime != that.executorRunTime) return false
+  if (this.resultSize != that.resultSize) return false
+  if (this.jvmGCTime != that.jvmGCTime) return false
+  if (this.resultSerializationTime != that.resultSerializationTime) 
return false
+  if (this.hostname == null && that.hostname != null) return false
+  if (this.hostname != null) this.hostname.equals(that.hostname) else 
true
+}
--- End diff --

it would be better if this looked like:
```
this.executorDeserializeTime == that.executorDeserializeTime &&
this.executorRunTime == that.executorRunTime &&
...
this.hostname == that.hostname
```
no need to check `equals` here because we're in scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12504][SQL] Masking credentials in the ...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10452#issuecomment-166970399
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12415] Do not use closure serializer to...

2015-12-23 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10368#discussion_r48372093
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResult.scala 
---
@@ -46,6 +46,26 @@ class DirectTaskResult[T](var valueBytes: ByteBuffer, 
var accumUpdates: Map[Long
 
   def this() = this(null.asInstanceOf[ByteBuffer], null, null)
 
+  override def toString: String = valueBytes.toString + " " + accumUpdates 
+ " " + metrics
+
+  override def equals(other: Any): Boolean = other match {
+case that: DirectTaskResult[_] => {
+  if (!this.valueBytes.equals(that.valueBytes)) return false
+
+  val accumSize = if (accumUpdates != null) accumUpdates.size else 0
+  val thatAccumSize = if (that.accumUpdates != null) 
that.accumUpdates.size else 0
+  if (accumSize != thatAccumSize) return false
+  if (accumSize > 0) {
+val b = this.accumUpdates.keys.forall { key =>
--- End diff --

Why compare two Maps manually? I think `Map` already implements `equals`. 
Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166946807
  
**[Test build #48245 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48245/consoleFull)**
 for PR 10391 at commit 
[`f5b9d50`](https://github.com/apache/spark/commit/f5b9d507df3946a18426061d3dc0d85e17cca80a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166946901
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166948261
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10391#issuecomment-166948262
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48246/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL] Incorrect resu...

2015-12-23 Thread yzhou2001
Github user yzhou2001 commented on the pull request:

https://github.com/apache/spark/pull/10450#issuecomment-166974291
  
This change is actually very close to 
https://github.com/apache/spark/pull/9548. Sorry I did not look at the code 
changes carefully there, except for the detailed explanations, before submit 
this PR. So I am closing this now and will think more on the issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL] Incorrect resu...

2015-12-23 Thread yzhou2001
Github user yzhou2001 closed the pull request at:

https://github.com/apache/spark/pull/10450


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >