date:20180426

[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21147
  
somehow I thought it has passed tests and I has merged it to master... 
Anyway this is a pretty safe change and I don't think it will break any tests. 
Let's see the test result later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...

2018-04-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21147


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r184351841
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -288,6 +288,114 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  private lazy val elementType = 
inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def dataType: DataType = BooleanType
+
+  override def inputTypes: Seq[AbstractDataType] = left.dataType match {
--- End diff --

yea if implicit type cast is not allowed for these functions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89883/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #89883 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89883/testReport)**
 for PR 21157 at commit 
[`c67ce29`](https://github.com/apache/spark/commit/c67ce29a3279073812070e6ff4bb2e2624961b36).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20140: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20140
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21021
  
**[Test build #89873 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89873/testReport)**
 for PR 21021 at commit 
[`f2798f9`](https://github.com/apache/spark/commit/f2798f9a6c2a804d6801fbc8b7dbef2755ae5359).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21136
  
**[Test build #89885 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89885/testReport)**
 for PR 21136 at commit 
[`fed913d`](https://github.com/apache/spark/commit/fed913ddafbc07c0ab9b773402f19375420a31f4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89873/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21106
  
I think `UnsafeProjectionCreator` is only useful for tests, because in 
production we should always try codegen first, and fallback to interpretation 
if codegen fails. There is no need to disable the fallback(always codegen) or 
skip the codegen(always interpretation) in production.

How about we provide a sql conf for test only, to be able to disable the 
fallback(always codegen) or skip the codegen(always interpretation)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21156
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89875/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21156
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20936
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20936
  
**[Test build #89876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89876/testReport)**
 for PR 20936 at commit 
[`b1b9985`](https://github.com/apache/spark/commit/b1b9985a5b745625bd7606331fde9b05a9af9442).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21156
  
**[Test build #89875 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89875/testReport)**
 for PR 21156 at commit 
[`a59c94f`](https://github.com/apache/spark/commit/a59c94f5b655fc034ce8907b98022cacf6bf318e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20936
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89876/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21014
  
**[Test build #89896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89896/testReport)**
 for PR 21014 at commit 
[`0038a23`](https://github.com/apache/spark/commit/0038a2304b8c07031a114c64531cfcbafc5085c9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89896/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to b...

2018-04-26 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21122#discussion_r184475841
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -31,10 +30,16 @@ import org.apache.spark.util.ListenerBus
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog
-  extends ListenerBus[ExternalCatalogEventListener, ExternalCatalogEvent] {
+trait ExternalCatalog {
--- End diff --

Hopefully, the next release. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89889 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89889/testReport)**
 for PR 21068 at commit 
[`0ba8510`](https://github.com/apache/spark/commit/0ba85108584d4e2c5649679a10543f9d2cfe367c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21121
  
**[Test build #89888 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89888/testReport)**
 for PR 21121 at commit 
[`bcd52bd`](https://github.com/apache/spark/commit/bcd52bd80f49fa8d2f32524044ed6c1a342b2bd0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ArrayJoin(`
  * `case class Flatten(child: Expression) extends UnaryExpression `
  * `case class MonthsBetween(`
  * `trait QueryPlanConstraints extends ConstraintHelper `
  * `trait ConstraintHelper `
  * `case class CachedRDDBuilder(`
  * `case class InMemoryRelation(`
  * `case class WriteToContinuousDataSource(`
  * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, 
query: SparkPlan)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21171: [SPARK-24104] SQLAppStatusListener overwrites met...

2018-04-26 Thread juliuszsompolski

GitHub user juliuszsompolski opened a pull request:

https://github.com/apache/spark/pull/21171

[SPARK-24104] SQLAppStatusListener overwrites metrics onDriverAccumUpdates 
instead of updating them

## What changes were proposed in this pull request?

Event `SparkListenerDriverAccumUpdates` may happen multiple times in a 
query - e.g. every `FileSourceScanExec` and `BroadcastExchangeExec` call 
`postDriverMetricUpdates`.
In Spark 2.2 `SQLListener` updated the map with new values. 
`SQLAppStatusListener` overwrites it.
Unless `update` preserved it in the KV store (dependant on 
`exec.lastWriteTime`), only the metrics from the last operator that does 
`postDriverMetricUpdates` are preserved.

## How was this patch tested?

Unit test added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/juliuszsompolski/apache-spark SPARK-24104

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21171


commit 69e09cc770514bdb7964b8552456bf7a83df7588
Author: Juliusz Sompolski 
Date:   2018-04-26T17:52:04Z

onDriverAccumUpdates




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to ReadTask...

2018-04-26 Thread arunmahadevan

Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21145
  
IMO, its better to keep it the current way. 

`DataReaderFactory` implies that its something that produces `DataReader` 
which it does, whereas `ReadTask` is gives a notion that it does some reading 
whereas what it really does is `createDataReader`. 

The current naming makes it consistent at the read/write side and I think 
the re-naming would add to the already confusing interfaces.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21014
  
**[Test build #89896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89896/testReport)**
 for PR 21014 at commit 
[`0038a23`](https://github.com/apache/spark/commit/0038a2304b8c07031a114c64531cfcbafc5085c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #89894 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89894/testReport)**
 for PR 17086 at commit 
[`6906dc4`](https://github.com/apache/spark/commit/6906dc4a9fb8d6efde9fb1763c79ad9381fc8dea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89889/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89898/testReport)**
 for PR 21068 at commit 
[`4df2311`](https://github.com/apache/spark/commit/4df231177343e6be04ec76d8c65e886763a5a152).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-04-26 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21050#discussion_r184466157
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1059,3 +1059,78 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Removes duplicate values from the array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes duplicate values from the array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3, null, 3));
+   [1,2,3,null]
+  """, since = "2.4.0")
+case class ArrayDistinct(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+
+  override def dataType: DataType = child.dataType
+
+  override def nullSafeEval(array: Any): Any = {
+val elementType = child.dataType.asInstanceOf[ArrayType].elementType
+val data = 
array.asInstanceOf[ArrayData].toArray[AnyRef](elementType).distinct
+new GenericArrayData(data.asInstanceOf[Array[Any]])
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val elementType = dataType.asInstanceOf[ArrayType].elementType
+nullSafeCodeGen(ctx, ev, (array) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val tempArray = ctx.freshName("tempArray")
+  val distinctArray = ctx.freshName("distinctArray")
+  val i = ctx.freshName("i")
+  val j = ctx.freshName("j")
+  val pos = ctx.freshName("arrayPosition")
+  val getValue1 = CodeGenerator.getValue(array, elementType, i)
+  val getValue2 = CodeGenerator.getValue(array, elementType, j)
+  s"""
+ |int $pos = 0;
+ |Object[] $tempArray = new Object[$array.numElements()];
+ |for (int $i = 0; $i < $array.numElements(); $i ++) {
+ |  if ($array.isNullAt($i)) {
+ | int $j;
+ | for ($j = 0; $j < $i; $j ++) {
+ |   if ($array.isNullAt($j))
+ | break;
+ | }
+ | if ($i == $j) {
+ |   $tempArray[$pos]  = null;
+ |   $pos = $pos + 1;
+ | }
+ |  }
+ |  else {
+ |int $j;
+ |for ($j = 0; $j < $i; $j ++) {
+ | if (${ctx.genEqual(elementType, getValue1, getValue2)})
--- End diff --

Shouldn't you check `$array.isNullAt($j)` in this loop as well? Especially, 
when `elementType` is primitive?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89894/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21121
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21121
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89888/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...

2018-04-26 Thread juliuszsompolski

Github user juliuszsompolski commented on the issue:

https://github.com/apache/spark/pull/21171
  
cc @gengliangwang @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21171
  
**[Test build #89897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89897/testReport)**
 for PR 21171 at commit 
[`69e09cc`](https://github.com/apache/spark/commit/69e09cc770514bdb7964b8552456bf7a83df7588).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21171
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21168
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21168
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r184425723
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
 ---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   ((4.0 / 9) * f2measure0 + (4.0 / 9) * f2measure1 + (1.0 / 9) * 
f2measure2)) < delta)
 assert(metrics.labels.sameElements(labels))
   }
+
+  test("Multiclass evaluation metrics with weights") {
+/*
+ * Confusion matrix for 3-class classification with total 9 instances 
with 2 weights:
+ * |2 * w1|1 * w2 |1 * w1| true class0 (4 instances)
+ * |1 * w2|2 * w1 + 1 * w2|0 | true class1 (4 instances)
+ * |0 |0  |1 * w2| true class2 (1 instance)
+ */
+val w1 = 2.2
+val w2 = 1.5
+val tw = 2.0 * w1 + 1.0 * w2 + 1.0 * w1 + 1.0 * w2 + 2.0 * w1 + 1.0 * 
w2 + 1.0 * w2
+val confusionMatrix = Matrices.dense(3, 3,
+  Array(2 * w1, 1 * w2, 0, 1 * w2, 2 * w1 + 1 * w2, 0, 1 * w1, 0, 1 * 
w2))
+val labels = Array(0.0, 1.0, 2.0)
+val predictionAndLabelsWithWeights = sc.parallelize(
+  Seq((0.0, 0.0, w1), (0.0, 1.0, w2), (0.0, 0.0, w1), (1.0, 0.0, w2),
+(1.0, 1.0, w1), (1.0, 1.0, w2), (1.0, 1.0, w1), (2.0, 2.0, w2),
+(2.0, 0.0, w1)), 2)
+val metrics = new MulticlassMetrics(predictionAndLabelsWithWeights)
+val delta = 0.001
+val tpRate0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1)
+val tpRate1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2)
+val tpRate2 = (1.0 * w2) / (1.0 * w2 + 0)
+val fpRate0 = (1.0 * w2) / (tw - (2.0 * w1 + 1.0 * w2 + 1.0 * w1))
+val fpRate1 = (1.0 * w2) / (tw - (1.0 * w2 + 2.0 * w1 + 1.0 * w2))
+val fpRate2 = (1.0 * w1) / (tw - (1.0 * w2))
+val precision0 = (2.0 * w1) / (2 * w1 + 1 * w2)
+val precision1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * 
w2)
+val precision2 = (1.0 * w2) / (1 * w1 + 1 * w2)
+val recall0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1)
+val recall1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2)
+val recall2 = (1.0 * w2) / (1.0 * w2 + 0)
+val f1measure0 = 2 * precision0 * recall0 / (precision0 + recall0)
+val f1measure1 = 2 * precision1 * recall1 / (precision1 + recall1)
+val f1measure2 = 2 * precision2 * recall2 / (precision2 + recall2)
+val f2measure0 = (1 + 2 * 2) * precision0 * recall0 / (2 * 2 * 
precision0 + recall0)
+val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * 
precision1 + recall1)
+val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * 
precision2 + recall2)
+
+
assert(metrics.confusionMatrix.toArray.sameElements(confusionMatrix.toArray))
+assert(math.abs(metrics.truePositiveRate(0.0) - tpRate0) < delta)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

2018-04-26 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21158
  
> I was looking for documentation on these sql confs

That's part of this fix, if you're talking about the old config. Since it 
was using `ConfigBuilder` directly instead of `buildConf`, it did not make it 
to the output of "SET -v".

If that was intentional then probably that part should be reverted - don't 
see why it should be hidden, but maybe @gatorsmile had a reason.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21168: added check to ensure main method is found [SPARK...

2018-04-26 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21168#discussion_r184439831
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -675,9 +675,14 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 val userThread = new Thread {
   override def run() {
 try {
-  mainMethod.invoke(null, userArgs.toArray)
-  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
-  logDebug("Done running users class")
+  if(mainMethod == null) {
--- End diff --

Also, I don't think this can return `null`. It should return no such method 
error if not found or return the method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark 20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/21165
  
> we should document the changes in a migration document or something,

I think documentation is necessary, will update the documentation tomorrow 
(Beijing time)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21169
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89892/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21014
  
**[Test build #89892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89892/testReport)**
 for PR 21014 at commit 
[`2260e15`](https://github.com/apache/spark/commit/2260e155c79a843ade705501209c43b6aee8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-26 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r184451743
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -116,6 +118,153 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns the union of all the given maps.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(map, ...) - Returns the union of all the given maps",
+examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));
+   [[1 -> "a"], [2 -> "c"], [3 -> "d"]
+  """)
+case class MapConcat(children: Seq[Expression]) extends Expression
+  with CodegenFallback {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+// this check currently does not allow valueContainsNull to vary,
+// and unfortunately none of the MapType toString methods include
+// valueContainsNull for the error message
+if (children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(
+s"$prettyName expects at least two input maps.")
+} else if (children.exists(!_.dataType.isInstanceOf[MapType])) {
+  TypeCheckResult.TypeCheckFailure(
+s"The given input of function $prettyName should all be of type 
map, " +
+  "but they are " + 
children.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+} else if (children.map(_.dataType).distinct.length > 1) {
+  TypeCheckResult.TypeCheckFailure(
+s"The given input maps of function $prettyName should all be the 
same type, " +
+  "but they are " + 
children.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: MapType = {
+children.headOption.map(_.dataType.asInstanceOf[MapType])
+  .getOrElse(MapType(keyType = StringType, valueType = StringType))
+  }
+
+  override def nullable: Boolean = true
+
+  override def eval(input: InternalRow): Any = {
+val union = new util.LinkedHashMap[Any, Any]()
+children.map(_.eval(input)).foreach { raw =>
+  if (raw == null) {
+return null
+  }
+  val map = raw.asInstanceOf[MapData]
+  map.foreach(dataType.keyType, dataType.valueType, (k, v) =>
+union.put(k, v)
+  )
+}
+val (keyArray, valueArray) = union.entrySet().toArray().map { e =>
+  val e2 = e.asInstanceOf[java.util.Map.Entry[Any, Any]]
+  (e2.getKey, e2.getValue)
+}.unzip
+new ArrayBasedMapData(new GenericArrayData(keyArray), new 
GenericArrayData(valueArray))
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val mapCodes = children.map(c => c.genCode(ctx))
+val keyType = children.head.dataType.asInstanceOf[MapType].keyType
+val valueType = children.head.dataType.asInstanceOf[MapType].valueType
+val mapRefArrayName = ctx.freshName("mapRefArray")
+val unionMapName = ctx.freshName("union")
+
+val mapDataClass = classOf[MapData].getName
+val arrayBasedMapDataClass = classOf[ArrayBasedMapData].getName
+val arrayDataClass = classOf[ArrayData].getName
+val genericArrayDataClass = classOf[GenericArrayData].getName
+val hashMapClass = classOf[util.LinkedHashMap[Any, Any]].getName
+val entryClass = classOf[util.Map.Entry[Any, Any]].getName
+
+val init =
+  s"""
+|$mapDataClass[] $mapRefArrayName = new 
$mapDataClass[${mapCodes.size}];
+|boolean ${ev.isNull} = false;
+|$mapDataClass ${ev.value} = null;
+  """.stripMargin
+
+val assignments = mapCodes.zipWithIndex.map { case (m, i) =>
+  val initCode = mapCodes(i).code
+  val valueVarName = mapCodes(i).value.code
+  s"""
+ |$initCode
+ |$mapRefArrayName[$i] = $valueVarName;
+ |if ($valueVarName == null) {
+ |  ${ev.isNull} = true;
+ |}
+   """.stripMargin
+}.mkString("\n")
+
+val index1Name = ctx.freshName("idx1")
+val index2Name = ctx.freshName("idx2")
+val mapDataName = ctx.freshName("m")
+val kaName = ctx.freshName("ka")
+val vaName = ctx.freshName("va")
+val keyName = ctx.freshName("key")
+val valueName = ctx.freshName("value")
+
+val mapMerge =
+  s"""
+|$hashMapClass $unionMapName = new 
$hashMapClass();
+|for (int $index1Name = 0; $index1Name < $mapRefArrayName.length; 
$index1Name++) {
+|  $mapDataClass $mapDataName =

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-26 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r18276
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -116,6 +118,153 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns the union of all the given maps.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(map, ...) - Returns the union of all the given maps",
+examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));
+   [[1 -> "a"], [2 -> "c"], [3 -> "d"]
+  """)
+case class MapConcat(children: Seq[Expression]) extends Expression
+  with CodegenFallback {
--- End diff --

Do you need to inherit from `CodegenFallback`  if you've overriden 
`doGenCode`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-26 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r184452750
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -56,6 +58,28 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(MapValues(m2), null)
   }
 
+  test("Map Concat") {
--- End diff --

Please add more test cases with `null` values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-26 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r184452242
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -116,6 +118,153 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns the union of all the given maps.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(map, ...) - Returns the union of all the given maps",
+examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));
+   [[1 -> "a"], [2 -> "c"], [3 -> "d"]
+  """)
--- End diff --

since = "2.4.0"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-26 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r184435943
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -116,6 +118,153 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns the union of all the given maps.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(map, ...) - Returns the union of all the given maps",
+examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));
+   [[1 -> "a"], [2 -> "c"], [3 -> "d"]
+  """)
+case class MapConcat(children: Seq[Expression]) extends Expression
+  with CodegenFallback {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+// this check currently does not allow valueContainsNull to vary,
+// and unfortunately none of the MapType toString methods include
+// valueContainsNull for the error message
+if (children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(
+s"$prettyName expects at least two input maps.")
+} else if (children.exists(!_.dataType.isInstanceOf[MapType])) {
+  TypeCheckResult.TypeCheckFailure(
+s"The given input of function $prettyName should all be of type 
map, " +
+  "but they are " + 
children.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+} else if (children.map(_.dataType).distinct.length > 1) {
+  TypeCheckResult.TypeCheckFailure(
+s"The given input maps of function $prettyName should all be the 
same type, " +
+  "but they are " + 
children.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: MapType = {
+children.headOption.map(_.dataType.asInstanceOf[MapType])
+  .getOrElse(MapType(keyType = StringType, valueType = StringType))
+  }
+
+  override def nullable: Boolean = true
--- End diff --

What about `override def nullable: Boolean = children.exists(_.nullable)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2699/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89886/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21121
  
**[Test build #89887 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89887/testReport)**
 for PR 21121 at commit 
[`51c8199`](https://github.com/apache/spark/commit/51c8199623837de6d2eeb5012d95635e056d4c04).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ArrayJoin(`
  * `case class Flatten(child: Expression) extends UnaryExpression `
  * `case class MonthsBetween(`
  * `trait QueryPlanConstraints extends ConstraintHelper `
  * `trait ConstraintHelper `
  * `case class CachedRDDBuilder(`
  * `case class InMemoryRelation(`
  * `case class WriteToContinuousDataSource(`
  * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, 
query: SparkPlan)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21121
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89887/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21121
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-04-26 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21050#discussion_r184471365
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1059,3 +1059,78 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Removes duplicate values from the array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Removes duplicate values from the array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3, null, 3));
+   [1,2,3,null]
+  """, since = "2.4.0")
+case class ArrayDistinct(child: Expression)
+  extends UnaryExpression with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType)
+
+  override def dataType: DataType = child.dataType
+
+  override def nullSafeEval(array: Any): Any = {
+val elementType = child.dataType.asInstanceOf[ArrayType].elementType
+val data = 
array.asInstanceOf[ArrayData].toArray[AnyRef](elementType).distinct
+new GenericArrayData(data.asInstanceOf[Array[Any]])
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val elementType = dataType.asInstanceOf[ArrayType].elementType
+nullSafeCodeGen(ctx, ev, (array) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val tempArray = ctx.freshName("tempArray")
+  val distinctArray = ctx.freshName("distinctArray")
+  val i = ctx.freshName("i")
+  val j = ctx.freshName("j")
+  val pos = ctx.freshName("arrayPosition")
+  val getValue1 = CodeGenerator.getValue(array, elementType, i)
+  val getValue2 = CodeGenerator.getValue(array, elementType, j)
+  s"""
+ |int $pos = 0;
+ |Object[] $tempArray = new Object[$array.numElements()];
--- End diff --

Just wondering about cases with big arrays with lots of duplicated values. 
In such cases, `$tempArray` is unnecesserily big. What about performing the 
filtering in two runs? The first run would calculate the result array size and 
the second would copy items from the source to the result?.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21171
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2700/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20936: [SPARK-23503][SS] Enforce sequencing of committed...

2018-04-26 Thread efimpoberezkin

Github user efimpoberezkin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20936#discussion_r184361843
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochCoordinator.scala
 ---
@@ -137,30 +137,65 @@ private[continuous] class EpochCoordinator(
   private val partitionOffsets =
 mutable.Map[(Long, Int), PartitionOffset]()
 
+  private var lastCommittedEpoch = startEpoch - 1
+  // Remembers epochs that have to wait for previous epochs to be 
committed first.
+  private val epochsWaitingToBeCommitted = mutable.HashSet.empty[Long]
+
   private def resolveCommitsAtEpoch(epoch: Long) = {
-val thisEpochCommits =
-  partitionCommits.collect { case ((e, _), msg) if e == epoch => msg }
+val thisEpochCommits = findCommitsForEpoch(epoch)
 val nextEpochOffsets =
   partitionOffsets.collect { case ((e, _), o) if e == epoch => o }
 
 if (thisEpochCommits.size == numWriterPartitions &&
   nextEpochOffsets.size == numReaderPartitions) {
-  logDebug(s"Epoch $epoch has received commits from all partitions. 
Committing globally.")
-  // Sequencing is important here. We must commit to the writer before 
recording the commit
-  // in the query, or we will end up dropping the commit if we restart 
in the middle.
-  writer.commit(epoch, thisEpochCommits.toArray)
-  query.commit(epoch)
-
-  // Cleanup state from before this epoch, now that we know all 
partitions are forever past it.
-  for (k <- partitionCommits.keys.filter { case (e, _) => e < epoch }) 
{
-partitionCommits.remove(k)
-  }
-  for (k <- partitionOffsets.keys.filter { case (e, _) => e < epoch }) 
{
-partitionOffsets.remove(k)
+
+  // Check that last committed epoch is the previous one for 
sequencing of committed epochs.
+  // If not, add the epoch being currently processed to epochs waiting 
to be committed,
+  // otherwise commit it.
+  if (lastCommittedEpoch != epoch - 1) {
+logDebug(s"Epoch $epoch has received commits from all partitions " 
+
+  s"and is waiting for epoch ${epoch - 1} to be committed first.")
+epochsWaitingToBeCommitted.add(epoch)
+  } else {
+commitEpoch(epoch, thisEpochCommits)
+lastCommittedEpoch = epoch
+
+// Commit subsequent epochs that are waiting to be committed.
+var nextEpoch = lastCommittedEpoch + 1
+while (epochsWaitingToBeCommitted.contains(nextEpoch)) {
+  val nextEpochCommits = findCommitsForEpoch(nextEpoch)
+  commitEpoch(nextEpoch, nextEpochCommits)
+
+  epochsWaitingToBeCommitted.remove(nextEpoch)
+  lastCommittedEpoch = nextEpoch
+  nextEpoch += 1
+}
+
+// Cleanup state from before last committed epoch,
+// now that we know all partitions are forever past it.
+for (k <- partitionCommits.keys.filter { case (e, _) => e < 
lastCommittedEpoch }) {
+  partitionCommits.remove(k)
+}
+for (k <- partitionOffsets.keys.filter { case (e, _) => e < 
lastCommittedEpoch }) {
+  partitionOffsets.remove(k)
+}
   }
 }
   }
 
+  private def findCommitsForEpoch(epoch: Long): 
Iterable[WriterCommitMessage] = {
--- End diff --

@tdas done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21167: [SPARK-24100][PYSPARK][DStreams]Add the CompressionCodec...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21167
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21152
  
**[Test build #89877 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89877/testReport)**
 for PR 21152 at commit 
[`3eee5c9`](https://github.com/apache/spark/commit/3eee5c99e891906015cc5fcae09acaae0091da86).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20959
  
**[Test build #89879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89879/testReport)**
 for PR 20959 at commit 
[`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

2018-04-26 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21158
  
Looks good.  I didn't have a jdbc connection handy to test with.

I was looking for documentation on these sql confs and I don't see them in 
the SET -v output, did we decide not to document these?  I haven't looked to 
see how that works.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r184420934
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
 ---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   ((4.0 / 9) * f2measure0 + (4.0 / 9) * f2measure1 + (1.0 / 9) * 
f2measure2)) < delta)
 assert(metrics.labels.sameElements(labels))
   }
+
+  test("Multiclass evaluation metrics with weights") {
+/*
+ * Confusion matrix for 3-class classification with total 9 instances 
with 2 weights:
+ * |2 * w1|1 * w2 |1 * w1| true class0 (4 instances)
+ * |1 * w2|2 * w1 + 1 * w2|0 | true class1 (4 instances)
+ * |0 |0  |1 * w2| true class2 (1 instance)
+ */
+val w1 = 2.2
+val w2 = 1.5
+val tw = 2.0 * w1 + 1.0 * w2 + 1.0 * w1 + 1.0 * w2 + 2.0 * w1 + 1.0 * 
w2 + 1.0 * w2
+val confusionMatrix = Matrices.dense(3, 3,
+  Array(2 * w1, 1 * w2, 0, 1 * w2, 2 * w1 + 1 * w2, 0, 1 * w1, 0, 1 * 
w2))
+val labels = Array(0.0, 1.0, 2.0)
+val predictionAndLabelsWithWeights = sc.parallelize(
+  Seq((0.0, 0.0, w1), (0.0, 1.0, w2), (0.0, 0.0, w1), (1.0, 0.0, w2),
+(1.0, 1.0, w1), (1.0, 1.0, w2), (1.0, 1.0, w1), (2.0, 2.0, w2),
+(2.0, 0.0, w1)), 2)
+val metrics = new MulticlassMetrics(predictionAndLabelsWithWeights)
+val delta = 0.001
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21159: [SPARK-24057][PYTHON]put the real data type in the Asser...

2018-04-26 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21159
  
merged to master, thanks @huaxingao !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20761
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89903 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89903/testReport)**
 for PR 21068 at commit 
[`17bbbee`](https://github.com/apache/spark/commit/17bbbee0cf952a32e44fd0767bba08814e351da2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21171
  
**[Test build #89897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89897/testReport)**
 for PR 21171 at commit 
[`69e09cc`](https://github.com/apache/spark/commit/69e09cc770514bdb7964b8552456bf7a83df7588).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21171
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89897/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21171
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21136: [SPARK-24061][SS]Add TypedFilter support for cont...

2018-04-26 Thread arunmahadevan

Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21136#discussion_r184528290
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
 ---
@@ -771,6 +778,16 @@ class UnsupportedOperationsSuite extends SparkFunSuite 
{
 }
   }
 
+  /** Assert that the logical plan is not supportd for continuous 
procsssing mode */
--- End diff --

is not supported -> is supported


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-26 Thread liyinan926

Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/21095
  
@madanadit sorry I didn't get a chance to look into this. I will take a 
detailed look once @foxish's comments on testing are addressed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21159: [SPARK-24057][PYTHON]put the real data type in th...

2018-04-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21159


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21159: [SPARK-24057][PYTHON]put the real data type in the Asser...

2018-04-26 Thread huaxingao

Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/21159
  
Thanks @BryanCutler @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21136
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89885/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21136
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r184430271
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
 ---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   ((4.0 / 9) * f2measure0 + (4.0 / 9) * f2measure1 + (1.0 / 9) * 
f2measure2)) < delta)
 assert(metrics.labels.sameElements(labels))
   }
+
+  test("Multiclass evaluation metrics with weights") {
+/*
+ * Confusion matrix for 3-class classification with total 9 instances 
with 2 weights:
+ * |2 * w1|1 * w2 |1 * w1| true class0 (4 instances)
+ * |1 * w2|2 * w1 + 1 * w2|0 | true class1 (4 instances)
+ * |0 |0  |1 * w2| true class2 (1 instance)
+ */
+val w1 = 2.2
+val w2 = 1.5
+val tw = 2.0 * w1 + 1.0 * w2 + 1.0 * w1 + 1.0 * w2 + 2.0 * w1 + 1.0 * 
w2 + 1.0 * w2
+val confusionMatrix = Matrices.dense(3, 3,
+  Array(2 * w1, 1 * w2, 0, 1 * w2, 2 * w1 + 1 * w2, 0, 1 * w1, 0, 1 * 
w2))
+val labels = Array(0.0, 1.0, 2.0)
+val predictionAndLabelsWithWeights = sc.parallelize(
+  Seq((0.0, 0.0, w1), (0.0, 1.0, w2), (0.0, 0.0, w1), (1.0, 0.0, w2),
+(1.0, 1.0, w1), (1.0, 1.0, w2), (1.0, 1.0, w1), (2.0, 2.0, w2),
+(2.0, 0.0, w1)), 2)
+val metrics = new MulticlassMetrics(predictionAndLabelsWithWeights)
+val delta = 0.001
+val tpRate0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1)
+val tpRate1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2)
+val tpRate2 = (1.0 * w2) / (1.0 * w2 + 0)
+val fpRate0 = (1.0 * w2) / (tw - (2.0 * w1 + 1.0 * w2 + 1.0 * w1))
+val fpRate1 = (1.0 * w2) / (tw - (1.0 * w2 + 2.0 * w1 + 1.0 * w2))
+val fpRate2 = (1.0 * w1) / (tw - (1.0 * w2))
+val precision0 = (2.0 * w1) / (2 * w1 + 1 * w2)
+val precision1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * 
w2)
+val precision2 = (1.0 * w2) / (1 * w1 + 1 * w2)
+val recall0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1)
+val recall1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2)
+val recall2 = (1.0 * w2) / (1.0 * w2 + 0)
+val f1measure0 = 2 * precision0 * recall0 / (precision0 + recall0)
+val f1measure1 = 2 * precision1 * recall1 / (precision1 + recall1)
+val f1measure2 = 2 * precision2 * recall2 / (precision2 + recall2)
+val f2measure0 = (1 + 2 * 2) * precision0 * recall0 / (2 * 2 * 
precision0 + recall0)
+val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * 
precision1 + recall1)
+val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * 
precision2 + recall2)
+
+
assert(metrics.confusionMatrix.toArray.sameElements(confusionMatrix.toArray))
+assert(math.abs(metrics.truePositiveRate(0.0) - tpRate0) < delta)
+assert(math.abs(metrics.truePositiveRate(1.0) - tpRate1) < delta)
+assert(math.abs(metrics.truePositiveRate(2.0) - tpRate2) < delta)
+assert(math.abs(metrics.falsePositiveRate(0.0) - fpRate0) < delta)
+assert(math.abs(metrics.falsePositiveRate(1.0) - fpRate1) < delta)
+assert(math.abs(metrics.falsePositiveRate(2.0) - fpRate2) < delta)
+assert(math.abs(metrics.precision(0.0) - precision0) < delta)
+assert(math.abs(metrics.precision(1.0) - precision1) < delta)
+assert(math.abs(metrics.precision(2.0) - precision2) < delta)
+assert(math.abs(metrics.recall(0.0) - recall0) < delta)
+assert(math.abs(metrics.recall(1.0) - recall1) < delta)
+assert(math.abs(metrics.recall(2.0) - recall2) < delta)
+assert(math.abs(metrics.fMeasure(0.0) - f1measure0) < delta)
+assert(math.abs(metrics.fMeasure(1.0) - f1measure1) < delta)
+assert(math.abs(metrics.fMeasure(2.0) - f1measure2) < delta)
+assert(math.abs(metrics.fMeasure(0.0, 2.0) - f2measure0) < delta)
+assert(math.abs(metrics.fMeasure(1.0, 2.0) - f2measure1) < delta)
+assert(math.abs(metrics.fMeasure(2.0, 2.0) - f2measure2) < delta)
+
+assert(math.abs(metrics.accuracy -
+  (2.0 * w1 + 2.0 * w1 + 1.0 * w2 + 1.0 * w2) / tw) < delta)
+assert(math.abs(metrics.accuracy - metrics.precision) < delta)
+assert(math.abs(metrics.accuracy - metrics.recall) < delta)
+assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta)
+assert(math.abs(metrics.accuracy - metrics.weightedRecall) < delta)
+assert(math.abs(metrics.weightedTruePositiveRate -
+  (((2 * w1 + 1 * w2 + 1 * w1) / tw) * tpRate0 +
+((1 * w2 + 2 * w1 + 1 * w2) / tw) * tpRate1 +
+(1 * w2 / tw) * tpRate2)) < delta)
+assert(math.abs(metrics.weightedFalsePositiveRate -
+  (((2

[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2694/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...

2018-04-26 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r184438550
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1417,10 +1417,13 @@ class DAGScheduler(
   case exceptionFailure: ExceptionFailure =>
 // Nothing left to do, already handled above for accumulator 
updates.
 
+  case _: TaskKilled =>
--- End diff --

nit: combine this with the `ExceptionFailure` case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21168: added check to ensure main method is found [SPARK...

2018-04-26 Thread eric-maynard

Github user eric-maynard commented on a diff in the pull request:

https://github.com/apache/spark/pull/21168#discussion_r184442769
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -675,9 +675,14 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 val userThread = new Thread {
   override def run() {
 try {
-  mainMethod.invoke(null, userArgs.toArray)
-  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
-  logDebug("Done running users class")
+  if(mainMethod == null) {
--- End diff --

Yes, I think you are definitely correct. It cannot be null. @vanzin is 
right, and the check should instead ensure the `main` being invoked is static. 
PR is updated, but I may decline as something is wrong with the way I am 
testing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/21169

[SPARK-23715][SQL] the input of to/from_utc_timestamp can not have timezone

## What changes were proposed in this pull request?

`from_utc_timestamp` assumes its input is in UTC timezone and shifts it to 
the specified timezone. When the timestamp contains timezone(e.g. 
`2018-03-13T06:18:23+00:00`), Spark breaks the semantic and respect the 
timezone in the string. This is not what user expects and the result is 
different from Hive/Scala. `to_utc_timestamp` has the same problem.

This PR fixes this by returning null if the input timestamp contains 
timezone.

TODO: add a config

## How was this patch tested?

new tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark from_utc_timezone

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21169


commit 7c1dcc3f3c144fe2aa1296c84840ff27a5a250e1
Author: Wenchen Fan 
Date:   2018-04-26T16:01:38Z

SPARK-23715: the input of to/from_utc_timestamp can not have timezone




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21170: [SPARK-22732][SS][FOLLOW-UP] Fix memoryV2.scala t...

2018-04-26 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/21170

[SPARK-22732][SS][FOLLOW-UP] Fix memoryV2.scala toString error

## What changes were proposed in this pull request?

Fix `memoryV2.scala` toString error

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22732

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21170






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-26 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r184462542
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2399,84 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 }
   }
 
+  /**
+   * This tests the case where origin task success after speculative task 
got FetchFailed
+   * before.
+   */
+  test("SPARK-23811: ShuffleMapStage failed by FetchFailed should ignore 
following " +
+"successful tasks") {
+// Create 3 RDDs with shuffle dependencies on each other: rddA <--- 
rddB <--- rddC
+val rddA = new MyRDD(sc, 2, Nil)
+val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2))
+val shuffleIdA = shuffleDepA.shuffleId
+
+val rddB = new MyRDD(sc, 2, List(shuffleDepA), tracker = 
mapOutputTracker)
+val shuffleDepB = new ShuffleDependency(rddB, new HashPartitioner(2))
+
+val rddC = new MyRDD(sc, 2, List(shuffleDepB), tracker = 
mapOutputTracker)
+
+submit(rddC, Array(0, 1))
+
+// Complete both tasks in rddA.
+assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0)
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus("hostA", 2)),
+  (Success, makeMapStatus("hostB", 2
+
+// The first task success
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(0), Success, makeMapStatus("hostB", 2)))
+
+// The second task's speculative attempt fails first, but task self 
still running.
+// This may caused by ExecutorLost.
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(1),
+  FetchFailed(makeBlockManagerId("hostA"), shuffleIdA, 0, 0, 
"ignored"),
+  null))
+// Check currently missing partition.
+
assert(mapOutputTracker.findMissingPartitions(shuffleDepB.shuffleId).get.size 
=== 1)
+// The second result task self success soon.
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(1), Success, makeMapStatus("hostB", 2)))
+// Missing partition number should not change, otherwise it will cause 
child stage
+// never succeed.
+
assert(mapOutputTracker.findMissingPartitions(shuffleDepB.shuffleId).get.size 
=== 1)
+  }
+
+  test("SPARK-23811: check ResultStage failed by FetchFailed can ignore 
following " +
+"successful tasks") {
+val rddA = new MyRDD(sc, 2, Nil)
+val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2))
+val shuffleIdA = shuffleDepA.shuffleId
+val rddB = new MyRDD(sc, 2, List(shuffleDepA), tracker = 
mapOutputTracker)
+submit(rddB, Array(0, 1))
+
+// Complete both tasks in rddA.
+assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0)
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus("hostA", 2)),
+  (Success, makeMapStatus("hostB", 2
+
+// The first task of rddB success
+assert(taskSets(1).tasks(0).isInstanceOf[ResultTask[_, _]])
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(0), Success, makeMapStatus("hostB", 2)))
+
+// The second task's speculative attempt fails first, but task self 
still running.
+// This may caused by ExecutorLost.
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(1),
+  FetchFailed(makeBlockManagerId("hostA"), shuffleIdA, 0, 0, 
"ignored"),
+  null))
+// Make sure failedStage is not empty now
+assert(scheduler.failedStages.nonEmpty)
+// The second result task self success soon.
+assert(taskSets(1).tasks(1).isInstanceOf[ResultTask[_, _]])
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(1), Success, makeMapStatus("hostB", 2)))
+assertDataStructuresEmpty()
--- End diff --

Right. It is a check that we are cleaning up the contents of the 
DAGScheduler's data structures so that they do not grow without bound over time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21168: added check to ensure main method is found [SPARK...

2018-04-26 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21168#discussion_r184428171
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -675,9 +675,14 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 val userThread = new Thread {
   override def run() {
 try {
-  mainMethod.invoke(null, userArgs.toArray)
-  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
-  logDebug("Done running users class")
+  if(mainMethod == null) {
--- End diff --

Can this be `null`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2695/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #89891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89891/testReport)**
 for PR 17086 at commit 
[`089d64b`](https://github.com/apache/spark/commit/089d64bca76b26f0f90a6dd591fc4218b8ab3700).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: Spark 20087: Attach accumulators / metrics to 'TaskKille...

2018-04-26 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21165
  
I'm not against the change, but since this changes the semantics of 
accumulators, we should document the changes in a migration document or 
something, WDYT @cloud-fan @gatorsmile  ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy

Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r184440128
  
--- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
@@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason {
  * Task was killed intentionally and needs to be rescheduled.
  */
 @DeveloperApi
-case class TaskKilled(reason: String) extends TaskFailedReason {
-  override def toErrorString: String = s"TaskKilled ($reason)"
+case class TaskKilled(
+reason: String,
+accumUpdates: Seq[AccumulableInfo] = Seq.empty,
+private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil)
+  extends TaskFailedReason {
+
+  override def toErrorString: String = "TaskKilled ($reason)"
--- End diff --

Will do.

Didn't notice `s` is missing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy

Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r184441076
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1417,10 +1417,13 @@ class DAGScheduler(
   case exceptionFailure: ExceptionFailure =>
 // Nothing left to do, already handled above for accumulator 
updates.
 
+  case _: TaskKilled =>
--- End diff --

will do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21014
  
**[Test build #89892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89892/testReport)**
 for PR 21014 at commit 
[`2260e15`](https://github.com/apache/spark/commit/2260e155c79a843ade705501209c43b6aee8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/21165
  
> It should be [Spark-20087] instead of [Spark 20087] in the title.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #89894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89894/testReport)**
 for PR 17086 at commit 
[`6906dc4`](https://github.com/apache/spark/commit/6906dc4a9fb8d6efde9fb1763c79ad9381fc8dea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21170: [SPARK-22732][SS][FOLLOW-UP] Fix memoryV2.scala toString...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21170
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2698/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 431 matches

Mail list logo