[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21147 somehow I thought it has passed tests and I has merged it to master... Anyway this is a pretty safe change and I don't think it will break any tests. Let's see the test result later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21147 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21028#discussion_r184351841 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -288,6 +288,114 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } +/** + * Checks if the two arrays contain at least one common element. + */ +@ExpressionDescription( + usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an element present also in a2.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5)); + true + """, since = "2.4.0") +case class ArraysOverlap(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + private lazy val elementType = inputTypes.head.asInstanceOf[ArrayType].elementType + + override def dataType: DataType = BooleanType + + override def inputTypes: Seq[AbstractDataType] = left.dataType match { --- End diff -- yea if implicit type cast is not allowed for these functions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21157 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89883/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21157 **[Test build #89883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89883/testReport)** for PR 21157 at commit [`c67ce29`](https://github.com/apache/spark/commit/c67ce29a3279073812070e6ff4bb2e2624961b36). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21157 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20140: [SPARK-19228][SQL] Introduce tryParseDate method to proc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20140 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21021 **[Test build #89873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89873/testReport)** for PR 21021 at commit [`f2798f9`](https://github.com/apache/spark/commit/f2798f9a6c2a804d6801fbc8b7dbef2755ae5359). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21136 **[Test build #89885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89885/testReport)** for PR 21136 at commit [`fed913d`](https://github.com/apache/spark/commit/fed913ddafbc07c0ab9b773402f19375420a31f4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89873/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21106 I think `UnsafeProjectionCreator` is only useful for tests, because in production we should always try codegen first, and fallback to interpretation if codegen fails. There is no need to disable the fallback(always codegen) or skip the codegen(always interpretation) in production. How about we provide a sql conf for test only, to be able to disable the fallback(always codegen) or skip the codegen(always interpretation)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21156 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89875/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21156 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20936 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20936 **[Test build #89876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89876/testReport)** for PR 20936 at commit [`b1b9985`](https://github.com/apache/spark/commit/b1b9985a5b745625bd7606331fde9b05a9af9442). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21156 **[Test build #89875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89875/testReport)** for PR 21156 at commit [`a59c94f`](https://github.com/apache/spark/commit/a59c94f5b655fc034ce8907b98022cacf6bf318e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20936: [SPARK-23503][SS] Enforce sequencing of committed epochs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20936 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89876/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21014 **[Test build #89896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89896/testReport)** for PR 21014 at commit [`0038a23`](https://github.com/apache/spark/commit/0038a2304b8c07031a114c64531cfcbafc5085c9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21014 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89896/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to b...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21122#discussion_r184475841 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -31,10 +30,16 @@ import org.apache.spark.util.ListenerBus * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog - extends ListenerBus[ExternalCatalogEventListener, ExternalCatalogEvent] { +trait ExternalCatalog { --- End diff -- Hopefully, the next release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21068 **[Test build #89889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89889/testReport)** for PR 21068 at commit [`0ba8510`](https://github.com/apache/spark/commit/0ba85108584d4e2c5649679a10543f9d2cfe367c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21121 **[Test build #89888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89888/testReport)** for PR 21121 at commit [`bcd52bd`](https://github.com/apache/spark/commit/bcd52bd80f49fa8d2f32524044ed6c1a342b2bd0). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ArrayJoin(` * `case class Flatten(child: Expression) extends UnaryExpression ` * `case class MonthsBetween(` * `trait QueryPlanConstraints extends ConstraintHelper ` * `trait ConstraintHelper ` * `case class CachedRDDBuilder(` * `case class InMemoryRelation(` * `case class WriteToContinuousDataSource(` * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, query: SparkPlan)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21171: [SPARK-24104] SQLAppStatusListener overwrites met...
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/21171 [SPARK-24104] SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them ## What changes were proposed in this pull request? Event `SparkListenerDriverAccumUpdates` may happen multiple times in a query - e.g. every `FileSourceScanExec` and `BroadcastExchangeExec` call `postDriverMetricUpdates`. In Spark 2.2 `SQLListener` updated the map with new values. `SQLAppStatusListener` overwrites it. Unless `update` preserved it in the KV store (dependant on `exec.lastWriteTime`), only the metrics from the last operator that does `postDriverMetricUpdates` are preserved. ## How was this patch tested? Unit test added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/juliuszsompolski/apache-spark SPARK-24104 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21171.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21171 commit 69e09cc770514bdb7964b8552456bf7a83df7588 Author: Juliusz SompolskiDate: 2018-04-26T17:52:04Z onDriverAccumUpdates --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to ReadTask...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21145 IMO, its better to keep it the current way. `DataReaderFactory` implies that its something that produces `DataReader` which it does, whereas `ReadTask` is gives a notion that it does some reading whereas what it really does is `createDataReader`. The current naming makes it consistent at the read/write side and I think the re-naming would add to the already confusing interfaces. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21014 **[Test build #89896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89896/testReport)** for PR 21014 at commit [`0038a23`](https://github.com/apache/spark/commit/0038a2304b8c07031a114c64531cfcbafc5085c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #89894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89894/testReport)** for PR 17086 at commit [`6906dc4`](https://github.com/apache/spark/commit/6906dc4a9fb8d6efde9fb1763c79ad9381fc8dea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21068 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89889/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21068 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21068 **[Test build #89898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89898/testReport)** for PR 21068 at commit [`4df2311`](https://github.com/apache/spark/commit/4df231177343e6be04ec76d8c65e886763a5a152). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r184466157 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1059,3 +1059,78 @@ case class Flatten(child: Expression) extends UnaryExpression { override def prettyName: String = "flatten" } + +/** + * Removes duplicate values from the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Removes duplicate values from the array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3)); + [1,2,3,null] + """, since = "2.4.0") +case class ArrayDistinct(child: Expression) + extends UnaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + override def dataType: DataType = child.dataType + + override def nullSafeEval(array: Any): Any = { +val elementType = child.dataType.asInstanceOf[ArrayType].elementType +val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType).distinct +new GenericArrayData(data.asInstanceOf[Array[Any]]) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val elementType = dataType.asInstanceOf[ArrayType].elementType +nullSafeCodeGen(ctx, ev, (array) => { + val arrayClass = classOf[GenericArrayData].getName + val tempArray = ctx.freshName("tempArray") + val distinctArray = ctx.freshName("distinctArray") + val i = ctx.freshName("i") + val j = ctx.freshName("j") + val pos = ctx.freshName("arrayPosition") + val getValue1 = CodeGenerator.getValue(array, elementType, i) + val getValue2 = CodeGenerator.getValue(array, elementType, j) + s""" + |int $pos = 0; + |Object[] $tempArray = new Object[$array.numElements()]; + |for (int $i = 0; $i < $array.numElements(); $i ++) { + | if ($array.isNullAt($i)) { + | int $j; + | for ($j = 0; $j < $i; $j ++) { + | if ($array.isNullAt($j)) + | break; + | } + | if ($i == $j) { + | $tempArray[$pos] = null; + | $pos = $pos + 1; + | } + | } + | else { + |int $j; + |for ($j = 0; $j < $i; $j ++) { + | if (${ctx.genEqual(elementType, getValue1, getValue2)}) --- End diff -- Shouldn't you check `$array.isNullAt($j)` in this loop as well? Especially, when `elementType` is primitive? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89894/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89888/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/21171 cc @gengliangwang @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21171 **[Test build #89897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89897/testReport)** for PR 21171 at commit [`69e09cc`](https://github.com/apache/spark/commit/69e09cc770514bdb7964b8552456bf7a83df7588). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21171 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21168 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21168 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r184425723 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite extends SparkFunSuite with MLlibTestSparkContext { ((4.0 / 9) * f2measure0 + (4.0 / 9) * f2measure1 + (1.0 / 9) * f2measure2)) < delta) assert(metrics.labels.sameElements(labels)) } + + test("Multiclass evaluation metrics with weights") { +/* + * Confusion matrix for 3-class classification with total 9 instances with 2 weights: + * |2 * w1|1 * w2 |1 * w1| true class0 (4 instances) + * |1 * w2|2 * w1 + 1 * w2|0 | true class1 (4 instances) + * |0 |0 |1 * w2| true class2 (1 instance) + */ +val w1 = 2.2 +val w2 = 1.5 +val tw = 2.0 * w1 + 1.0 * w2 + 1.0 * w1 + 1.0 * w2 + 2.0 * w1 + 1.0 * w2 + 1.0 * w2 +val confusionMatrix = Matrices.dense(3, 3, + Array(2 * w1, 1 * w2, 0, 1 * w2, 2 * w1 + 1 * w2, 0, 1 * w1, 0, 1 * w2)) +val labels = Array(0.0, 1.0, 2.0) +val predictionAndLabelsWithWeights = sc.parallelize( + Seq((0.0, 0.0, w1), (0.0, 1.0, w2), (0.0, 0.0, w1), (1.0, 0.0, w2), +(1.0, 1.0, w1), (1.0, 1.0, w2), (1.0, 1.0, w1), (2.0, 2.0, w2), +(2.0, 0.0, w1)), 2) +val metrics = new MulticlassMetrics(predictionAndLabelsWithWeights) +val delta = 0.001 +val tpRate0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1) +val tpRate1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2) +val tpRate2 = (1.0 * w2) / (1.0 * w2 + 0) +val fpRate0 = (1.0 * w2) / (tw - (2.0 * w1 + 1.0 * w2 + 1.0 * w1)) +val fpRate1 = (1.0 * w2) / (tw - (1.0 * w2 + 2.0 * w1 + 1.0 * w2)) +val fpRate2 = (1.0 * w1) / (tw - (1.0 * w2)) +val precision0 = (2.0 * w1) / (2 * w1 + 1 * w2) +val precision1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2) +val precision2 = (1.0 * w2) / (1 * w1 + 1 * w2) +val recall0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1) +val recall1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2) +val recall2 = (1.0 * w2) / (1.0 * w2 + 0) +val f1measure0 = 2 * precision0 * recall0 / (precision0 + recall0) +val f1measure1 = 2 * precision1 * recall1 / (precision1 + recall1) +val f1measure2 = 2 * precision2 * recall2 / (precision2 + recall2) +val f2measure0 = (1 + 2 * 2) * precision0 * recall0 / (2 * 2 * precision0 + recall0) +val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * precision1 + recall1) +val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * precision2 + recall2) + + assert(metrics.confusionMatrix.toArray.sameElements(confusionMatrix.toArray)) +assert(math.abs(metrics.truePositiveRate(0.0) - tpRate0) < delta) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21158 > I was looking for documentation on these sql confs That's part of this fix, if you're talking about the old config. Since it was using `ConfigBuilder` directly instead of `buildConf`, it did not make it to the output of "SET -v". If that was intentional then probably that part should be reverted - don't see why it should be hidden, but maybe @gatorsmile had a reason. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21168: added check to ensure main method is found [SPARK...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21168#discussion_r184439831 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -675,9 +675,14 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends val userThread = new Thread { override def run() { try { - mainMethod.invoke(null, userArgs.toArray) - finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) - logDebug("Done running users class") + if(mainMethod == null) { --- End diff -- Also, I don't think this can return `null`. It should return no such method error if not found or return the method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21165: [Spark 20087][CORE] Attach accumulators / metrics to 'Ta...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 > we should document the changes in a migration document or something, I think documentation is necessary, will update the documentation tomorrow (Beijing time) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21169 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21014 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89892/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21014 **[Test build #89892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89892/testReport)** for PR 21014 at commit [`2260e15`](https://github.com/apache/spark/commit/2260e155c79a843ade705501209c43b6aee8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21014 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r184451743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -116,6 +118,153 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns the union of all the given maps. + */ +@ExpressionDescription( +usage = "_FUNC_(map, ...) - Returns the union of all the given maps", +examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd')); + [[1 -> "a"], [2 -> "c"], [3 -> "d"] + """) +case class MapConcat(children: Seq[Expression]) extends Expression + with CodegenFallback { + + override def checkInputDataTypes(): TypeCheckResult = { +// this check currently does not allow valueContainsNull to vary, +// and unfortunately none of the MapType toString methods include +// valueContainsNull for the error message +if (children.size < 2) { + TypeCheckResult.TypeCheckFailure( +s"$prettyName expects at least two input maps.") +} else if (children.exists(!_.dataType.isInstanceOf[MapType])) { + TypeCheckResult.TypeCheckFailure( +s"The given input of function $prettyName should all be of type map, " + + "but they are " + children.map(_.dataType.simpleString).mkString("[", ", ", "]")) +} else if (children.map(_.dataType).distinct.length > 1) { + TypeCheckResult.TypeCheckFailure( +s"The given input maps of function $prettyName should all be the same type, " + + "but they are " + children.map(_.dataType.simpleString).mkString("[", ", ", "]")) +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: MapType = { +children.headOption.map(_.dataType.asInstanceOf[MapType]) + .getOrElse(MapType(keyType = StringType, valueType = StringType)) + } + + override def nullable: Boolean = true + + override def eval(input: InternalRow): Any = { +val union = new util.LinkedHashMap[Any, Any]() +children.map(_.eval(input)).foreach { raw => + if (raw == null) { +return null + } + val map = raw.asInstanceOf[MapData] + map.foreach(dataType.keyType, dataType.valueType, (k, v) => +union.put(k, v) + ) +} +val (keyArray, valueArray) = union.entrySet().toArray().map { e => + val e2 = e.asInstanceOf[java.util.Map.Entry[Any, Any]] + (e2.getKey, e2.getValue) +}.unzip +new ArrayBasedMapData(new GenericArrayData(keyArray), new GenericArrayData(valueArray)) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val mapCodes = children.map(c => c.genCode(ctx)) +val keyType = children.head.dataType.asInstanceOf[MapType].keyType +val valueType = children.head.dataType.asInstanceOf[MapType].valueType +val mapRefArrayName = ctx.freshName("mapRefArray") +val unionMapName = ctx.freshName("union") + +val mapDataClass = classOf[MapData].getName +val arrayBasedMapDataClass = classOf[ArrayBasedMapData].getName +val arrayDataClass = classOf[ArrayData].getName +val genericArrayDataClass = classOf[GenericArrayData].getName +val hashMapClass = classOf[util.LinkedHashMap[Any, Any]].getName +val entryClass = classOf[util.Map.Entry[Any, Any]].getName + +val init = + s""" +|$mapDataClass[] $mapRefArrayName = new $mapDataClass[${mapCodes.size}]; +|boolean ${ev.isNull} = false; +|$mapDataClass ${ev.value} = null; + """.stripMargin + +val assignments = mapCodes.zipWithIndex.map { case (m, i) => + val initCode = mapCodes(i).code + val valueVarName = mapCodes(i).value.code + s""" + |$initCode + |$mapRefArrayName[$i] = $valueVarName; + |if ($valueVarName == null) { + | ${ev.isNull} = true; + |} + """.stripMargin +}.mkString("\n") + +val index1Name = ctx.freshName("idx1") +val index2Name = ctx.freshName("idx2") +val mapDataName = ctx.freshName("m") +val kaName = ctx.freshName("ka") +val vaName = ctx.freshName("va") +val keyName = ctx.freshName("key") +val valueName = ctx.freshName("value") + +val mapMerge = + s""" +|$hashMapClass
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r18276 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -116,6 +118,153 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns the union of all the given maps. + */ +@ExpressionDescription( +usage = "_FUNC_(map, ...) - Returns the union of all the given maps", +examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd')); + [[1 -> "a"], [2 -> "c"], [3 -> "d"] + """) +case class MapConcat(children: Seq[Expression]) extends Expression + with CodegenFallback { --- End diff -- Do you need to inherit from `CodegenFallback` if you've overriden `doGenCode`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r184452750 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -56,6 +58,28 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(MapValues(m2), null) } + test("Map Concat") { --- End diff -- Please add more test cases with `null` values. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r184452242 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -116,6 +118,153 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns the union of all the given maps. + */ +@ExpressionDescription( +usage = "_FUNC_(map, ...) - Returns the union of all the given maps", +examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd')); + [[1 -> "a"], [2 -> "c"], [3 -> "d"] + """) --- End diff -- since = "2.4.0" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r184435943 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -116,6 +118,153 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns the union of all the given maps. + */ +@ExpressionDescription( +usage = "_FUNC_(map, ...) - Returns the union of all the given maps", +examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd')); + [[1 -> "a"], [2 -> "c"], [3 -> "d"] + """) +case class MapConcat(children: Seq[Expression]) extends Expression + with CodegenFallback { + + override def checkInputDataTypes(): TypeCheckResult = { +// this check currently does not allow valueContainsNull to vary, +// and unfortunately none of the MapType toString methods include +// valueContainsNull for the error message +if (children.size < 2) { + TypeCheckResult.TypeCheckFailure( +s"$prettyName expects at least two input maps.") +} else if (children.exists(!_.dataType.isInstanceOf[MapType])) { + TypeCheckResult.TypeCheckFailure( +s"The given input of function $prettyName should all be of type map, " + + "but they are " + children.map(_.dataType.simpleString).mkString("[", ", ", "]")) +} else if (children.map(_.dataType).distinct.length > 1) { + TypeCheckResult.TypeCheckFailure( +s"The given input maps of function $prettyName should all be the same type, " + + "but they are " + children.map(_.dataType.simpleString).mkString("[", ", ", "]")) +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: MapType = { +children.headOption.map(_.dataType.asInstanceOf[MapType]) + .getOrElse(MapType(keyType = StringType, valueType = StringType)) + } + + override def nullable: Boolean = true --- End diff -- What about `override def nullable: Boolean = children.exists(_.nullable)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2699/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89886/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21121 **[Test build #89887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89887/testReport)** for PR 21121 at commit [`51c8199`](https://github.com/apache/spark/commit/51c8199623837de6d2eeb5012d95635e056d4c04). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ArrayJoin(` * `case class Flatten(child: Expression) extends UnaryExpression ` * `case class MonthsBetween(` * `trait QueryPlanConstraints extends ConstraintHelper ` * `trait ConstraintHelper ` * `case class CachedRDDBuilder(` * `case class InMemoryRelation(` * `case class WriteToContinuousDataSource(` * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, query: SparkPlan)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89887/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r184471365 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1059,3 +1059,78 @@ case class Flatten(child: Expression) extends UnaryExpression { override def prettyName: String = "flatten" } + +/** + * Removes duplicate values from the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Removes duplicate values from the array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3)); + [1,2,3,null] + """, since = "2.4.0") +case class ArrayDistinct(child: Expression) + extends UnaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + override def dataType: DataType = child.dataType + + override def nullSafeEval(array: Any): Any = { +val elementType = child.dataType.asInstanceOf[ArrayType].elementType +val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType).distinct +new GenericArrayData(data.asInstanceOf[Array[Any]]) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val elementType = dataType.asInstanceOf[ArrayType].elementType +nullSafeCodeGen(ctx, ev, (array) => { + val arrayClass = classOf[GenericArrayData].getName + val tempArray = ctx.freshName("tempArray") + val distinctArray = ctx.freshName("distinctArray") + val i = ctx.freshName("i") + val j = ctx.freshName("j") + val pos = ctx.freshName("arrayPosition") + val getValue1 = CodeGenerator.getValue(array, elementType, i) + val getValue2 = CodeGenerator.getValue(array, elementType, j) + s""" + |int $pos = 0; + |Object[] $tempArray = new Object[$array.numElements()]; --- End diff -- Just wondering about cases with big arrays with lots of duplicated values. In such cases, `$tempArray` is unnecesserily big. What about performing the filtering in two runs? The first run would calculate the result array size and the second would copy items from the source to the result?. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21014 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21171 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2700/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20936: [SPARK-23503][SS] Enforce sequencing of committed...
Github user efimpoberezkin commented on a diff in the pull request: https://github.com/apache/spark/pull/20936#discussion_r184361843 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochCoordinator.scala --- @@ -137,30 +137,65 @@ private[continuous] class EpochCoordinator( private val partitionOffsets = mutable.Map[(Long, Int), PartitionOffset]() + private var lastCommittedEpoch = startEpoch - 1 + // Remembers epochs that have to wait for previous epochs to be committed first. + private val epochsWaitingToBeCommitted = mutable.HashSet.empty[Long] + private def resolveCommitsAtEpoch(epoch: Long) = { -val thisEpochCommits = - partitionCommits.collect { case ((e, _), msg) if e == epoch => msg } +val thisEpochCommits = findCommitsForEpoch(epoch) val nextEpochOffsets = partitionOffsets.collect { case ((e, _), o) if e == epoch => o } if (thisEpochCommits.size == numWriterPartitions && nextEpochOffsets.size == numReaderPartitions) { - logDebug(s"Epoch $epoch has received commits from all partitions. Committing globally.") - // Sequencing is important here. We must commit to the writer before recording the commit - // in the query, or we will end up dropping the commit if we restart in the middle. - writer.commit(epoch, thisEpochCommits.toArray) - query.commit(epoch) - - // Cleanup state from before this epoch, now that we know all partitions are forever past it. - for (k <- partitionCommits.keys.filter { case (e, _) => e < epoch }) { -partitionCommits.remove(k) - } - for (k <- partitionOffsets.keys.filter { case (e, _) => e < epoch }) { -partitionOffsets.remove(k) + + // Check that last committed epoch is the previous one for sequencing of committed epochs. + // If not, add the epoch being currently processed to epochs waiting to be committed, + // otherwise commit it. + if (lastCommittedEpoch != epoch - 1) { +logDebug(s"Epoch $epoch has received commits from all partitions " + + s"and is waiting for epoch ${epoch - 1} to be committed first.") +epochsWaitingToBeCommitted.add(epoch) + } else { +commitEpoch(epoch, thisEpochCommits) +lastCommittedEpoch = epoch + +// Commit subsequent epochs that are waiting to be committed. +var nextEpoch = lastCommittedEpoch + 1 +while (epochsWaitingToBeCommitted.contains(nextEpoch)) { + val nextEpochCommits = findCommitsForEpoch(nextEpoch) + commitEpoch(nextEpoch, nextEpochCommits) + + epochsWaitingToBeCommitted.remove(nextEpoch) + lastCommittedEpoch = nextEpoch + nextEpoch += 1 +} + +// Cleanup state from before last committed epoch, +// now that we know all partitions are forever past it. +for (k <- partitionCommits.keys.filter { case (e, _) => e < lastCommittedEpoch }) { + partitionCommits.remove(k) +} +for (k <- partitionOffsets.keys.filter { case (e, _) => e < lastCommittedEpoch }) { + partitionOffsets.remove(k) +} } } } + private def findCommitsForEpoch(epoch: Long): Iterable[WriterCommitMessage] = { --- End diff -- @tdas done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21167: [SPARK-24100][PYSPARK][DStreams]Add the CompressionCodec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21167 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21152 **[Test build #89877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89877/testReport)** for PR 21152 at commit [`3eee5c9`](https://github.com/apache/spark/commit/3eee5c99e891906015cc5fcae09acaae0091da86). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89879/testReport)** for PR 20959 at commit [`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21158 Looks good. I didn't have a jdbc connection handy to test with. I was looking for documentation on these sql confs and I don't see them in the SET -v output, did we decide not to document these? I haven't looked to see how that works. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r184420934 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite extends SparkFunSuite with MLlibTestSparkContext { ((4.0 / 9) * f2measure0 + (4.0 / 9) * f2measure1 + (1.0 / 9) * f2measure2)) < delta) assert(metrics.labels.sameElements(labels)) } + + test("Multiclass evaluation metrics with weights") { +/* + * Confusion matrix for 3-class classification with total 9 instances with 2 weights: + * |2 * w1|1 * w2 |1 * w1| true class0 (4 instances) + * |1 * w2|2 * w1 + 1 * w2|0 | true class1 (4 instances) + * |0 |0 |1 * w2| true class2 (1 instance) + */ +val w1 = 2.2 +val w2 = 1.5 +val tw = 2.0 * w1 + 1.0 * w2 + 1.0 * w1 + 1.0 * w2 + 2.0 * w1 + 1.0 * w2 + 1.0 * w2 +val confusionMatrix = Matrices.dense(3, 3, + Array(2 * w1, 1 * w2, 0, 1 * w2, 2 * w1 + 1 * w2, 0, 1 * w1, 0, 1 * w2)) +val labels = Array(0.0, 1.0, 2.0) +val predictionAndLabelsWithWeights = sc.parallelize( + Seq((0.0, 0.0, w1), (0.0, 1.0, w2), (0.0, 0.0, w1), (1.0, 0.0, w2), +(1.0, 1.0, w1), (1.0, 1.0, w2), (1.0, 1.0, w1), (2.0, 2.0, w2), +(2.0, 0.0, w1)), 2) +val metrics = new MulticlassMetrics(predictionAndLabelsWithWeights) +val delta = 0.001 --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21159: [SPARK-24057][PYTHON]put the real data type in the Asser...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21159 merged to master, thanks @huaxingao ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21068 **[Test build #89903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89903/testReport)** for PR 21068 at commit [`17bbbee`](https://github.com/apache/spark/commit/17bbbee0cf952a32e44fd0767bba08814e351da2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21171 **[Test build #89897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89897/testReport)** for PR 21171 at commit [`69e09cc`](https://github.com/apache/spark/commit/69e09cc770514bdb7964b8552456bf7a83df7588). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21171 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89897/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21171: [SPARK-24104] SQLAppStatusListener overwrites metrics on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21171 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21136: [SPARK-24061][SS]Add TypedFilter support for cont...
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21136#discussion_r184528290 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala --- @@ -771,6 +778,16 @@ class UnsupportedOperationsSuite extends SparkFunSuite { } } + /** Assert that the logical plan is not supportd for continuous procsssing mode */ --- End diff -- is not supported -> is supported --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21095 @madanadit sorry I didn't get a chance to look into this. I will take a detailed look once @foxish's comments on testing are addressed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21159: [SPARK-24057][PYTHON]put the real data type in th...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21159 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21159: [SPARK-24057][PYTHON]put the real data type in the Asser...
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21159 Thanks @BryanCutler @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21136 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89885/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21136 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r184430271 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite extends SparkFunSuite with MLlibTestSparkContext { ((4.0 / 9) * f2measure0 + (4.0 / 9) * f2measure1 + (1.0 / 9) * f2measure2)) < delta) assert(metrics.labels.sameElements(labels)) } + + test("Multiclass evaluation metrics with weights") { +/* + * Confusion matrix for 3-class classification with total 9 instances with 2 weights: + * |2 * w1|1 * w2 |1 * w1| true class0 (4 instances) + * |1 * w2|2 * w1 + 1 * w2|0 | true class1 (4 instances) + * |0 |0 |1 * w2| true class2 (1 instance) + */ +val w1 = 2.2 +val w2 = 1.5 +val tw = 2.0 * w1 + 1.0 * w2 + 1.0 * w1 + 1.0 * w2 + 2.0 * w1 + 1.0 * w2 + 1.0 * w2 +val confusionMatrix = Matrices.dense(3, 3, + Array(2 * w1, 1 * w2, 0, 1 * w2, 2 * w1 + 1 * w2, 0, 1 * w1, 0, 1 * w2)) +val labels = Array(0.0, 1.0, 2.0) +val predictionAndLabelsWithWeights = sc.parallelize( + Seq((0.0, 0.0, w1), (0.0, 1.0, w2), (0.0, 0.0, w1), (1.0, 0.0, w2), +(1.0, 1.0, w1), (1.0, 1.0, w2), (1.0, 1.0, w1), (2.0, 2.0, w2), +(2.0, 0.0, w1)), 2) +val metrics = new MulticlassMetrics(predictionAndLabelsWithWeights) +val delta = 0.001 +val tpRate0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1) +val tpRate1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2) +val tpRate2 = (1.0 * w2) / (1.0 * w2 + 0) +val fpRate0 = (1.0 * w2) / (tw - (2.0 * w1 + 1.0 * w2 + 1.0 * w1)) +val fpRate1 = (1.0 * w2) / (tw - (1.0 * w2 + 2.0 * w1 + 1.0 * w2)) +val fpRate2 = (1.0 * w1) / (tw - (1.0 * w2)) +val precision0 = (2.0 * w1) / (2 * w1 + 1 * w2) +val precision1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2) +val precision2 = (1.0 * w2) / (1 * w1 + 1 * w2) +val recall0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1) +val recall1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2) +val recall2 = (1.0 * w2) / (1.0 * w2 + 0) +val f1measure0 = 2 * precision0 * recall0 / (precision0 + recall0) +val f1measure1 = 2 * precision1 * recall1 / (precision1 + recall1) +val f1measure2 = 2 * precision2 * recall2 / (precision2 + recall2) +val f2measure0 = (1 + 2 * 2) * precision0 * recall0 / (2 * 2 * precision0 + recall0) +val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * precision1 + recall1) +val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * precision2 + recall2) + + assert(metrics.confusionMatrix.toArray.sameElements(confusionMatrix.toArray)) +assert(math.abs(metrics.truePositiveRate(0.0) - tpRate0) < delta) +assert(math.abs(metrics.truePositiveRate(1.0) - tpRate1) < delta) +assert(math.abs(metrics.truePositiveRate(2.0) - tpRate2) < delta) +assert(math.abs(metrics.falsePositiveRate(0.0) - fpRate0) < delta) +assert(math.abs(metrics.falsePositiveRate(1.0) - fpRate1) < delta) +assert(math.abs(metrics.falsePositiveRate(2.0) - fpRate2) < delta) +assert(math.abs(metrics.precision(0.0) - precision0) < delta) +assert(math.abs(metrics.precision(1.0) - precision1) < delta) +assert(math.abs(metrics.precision(2.0) - precision2) < delta) +assert(math.abs(metrics.recall(0.0) - recall0) < delta) +assert(math.abs(metrics.recall(1.0) - recall1) < delta) +assert(math.abs(metrics.recall(2.0) - recall2) < delta) +assert(math.abs(metrics.fMeasure(0.0) - f1measure0) < delta) +assert(math.abs(metrics.fMeasure(1.0) - f1measure1) < delta) +assert(math.abs(metrics.fMeasure(2.0) - f1measure2) < delta) +assert(math.abs(metrics.fMeasure(0.0, 2.0) - f2measure0) < delta) +assert(math.abs(metrics.fMeasure(1.0, 2.0) - f2measure1) < delta) +assert(math.abs(metrics.fMeasure(2.0, 2.0) - f2measure2) < delta) + +assert(math.abs(metrics.accuracy - + (2.0 * w1 + 2.0 * w1 + 1.0 * w2 + 1.0 * w2) / tw) < delta) +assert(math.abs(metrics.accuracy - metrics.precision) < delta) +assert(math.abs(metrics.accuracy - metrics.recall) < delta) +assert(math.abs(metrics.accuracy - metrics.fMeasure) < delta) +assert(math.abs(metrics.accuracy - metrics.weightedRecall) < delta) +assert(math.abs(metrics.weightedTruePositiveRate - + (((2 * w1 + 1 * w2 + 1 * w1) / tw) * tpRate0 + +((1 * w2 + 2 * w1 + 1 * w2) / tw) * tpRate1 + +(1 * w2 / tw) * tpRate2)) < delta) +assert(math.abs(metrics.weightedFalsePositiveRate - + (((2
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2694/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184438550 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1417,10 +1417,13 @@ class DAGScheduler( case exceptionFailure: ExceptionFailure => // Nothing left to do, already handled above for accumulator updates. + case _: TaskKilled => --- End diff -- nit: combine this with the `ExceptionFailure` case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21168: added check to ensure main method is found [SPARK...
Github user eric-maynard commented on a diff in the pull request: https://github.com/apache/spark/pull/21168#discussion_r184442769 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -675,9 +675,14 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends val userThread = new Thread { override def run() { try { - mainMethod.invoke(null, userArgs.toArray) - finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) - logDebug("Done running users class") + if(mainMethod == null) { --- End diff -- Yes, I think you are definitely correct. It cannot be null. @vanzin is right, and the check should instead ensure the `main` being invoked is static. PR is updated, but I may decline as something is wrong with the way I am testing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/21169 [SPARK-23715][SQL] the input of to/from_utc_timestamp can not have timezone ## What changes were proposed in this pull request? `from_utc_timestamp` assumes its input is in UTC timezone and shifts it to the specified timezone. When the timestamp contains timezone(e.g. `2018-03-13T06:18:23+00:00`), Spark breaks the semantic and respect the timezone in the string. This is not what user expects and the result is different from Hive/Scala. `to_utc_timestamp` has the same problem. This PR fixes this by returning null if the input timestamp contains timezone. TODO: add a config ## How was this patch tested? new tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark from_utc_timezone Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21169.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21169 commit 7c1dcc3f3c144fe2aa1296c84840ff27a5a250e1 Author: Wenchen FanDate: 2018-04-26T16:01:38Z SPARK-23715: the input of to/from_utc_timestamp can not have timezone --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21170: [SPARK-22732][SS][FOLLOW-UP] Fix memoryV2.scala t...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/21170 [SPARK-22732][SS][FOLLOW-UP] Fix memoryV2.scala toString error ## What changes were proposed in this pull request? Fix `memoryV2.scala` toString error ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22732 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21170.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21170 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/20930#discussion_r184462542 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2399,6 +2399,84 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi } } + /** + * This tests the case where origin task success after speculative task got FetchFailed + * before. + */ + test("SPARK-23811: ShuffleMapStage failed by FetchFailed should ignore following " + +"successful tasks") { +// Create 3 RDDs with shuffle dependencies on each other: rddA <--- rddB <--- rddC +val rddA = new MyRDD(sc, 2, Nil) +val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2)) +val shuffleIdA = shuffleDepA.shuffleId + +val rddB = new MyRDD(sc, 2, List(shuffleDepA), tracker = mapOutputTracker) +val shuffleDepB = new ShuffleDependency(rddB, new HashPartitioner(2)) + +val rddC = new MyRDD(sc, 2, List(shuffleDepB), tracker = mapOutputTracker) + +submit(rddC, Array(0, 1)) + +// Complete both tasks in rddA. +assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0) +complete(taskSets(0), Seq( + (Success, makeMapStatus("hostA", 2)), + (Success, makeMapStatus("hostB", 2 + +// The first task success +runEvent(makeCompletionEvent( + taskSets(1).tasks(0), Success, makeMapStatus("hostB", 2))) + +// The second task's speculative attempt fails first, but task self still running. +// This may caused by ExecutorLost. +runEvent(makeCompletionEvent( + taskSets(1).tasks(1), + FetchFailed(makeBlockManagerId("hostA"), shuffleIdA, 0, 0, "ignored"), + null)) +// Check currently missing partition. + assert(mapOutputTracker.findMissingPartitions(shuffleDepB.shuffleId).get.size === 1) +// The second result task self success soon. +runEvent(makeCompletionEvent( + taskSets(1).tasks(1), Success, makeMapStatus("hostB", 2))) +// Missing partition number should not change, otherwise it will cause child stage +// never succeed. + assert(mapOutputTracker.findMissingPartitions(shuffleDepB.shuffleId).get.size === 1) + } + + test("SPARK-23811: check ResultStage failed by FetchFailed can ignore following " + +"successful tasks") { +val rddA = new MyRDD(sc, 2, Nil) +val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2)) +val shuffleIdA = shuffleDepA.shuffleId +val rddB = new MyRDD(sc, 2, List(shuffleDepA), tracker = mapOutputTracker) +submit(rddB, Array(0, 1)) + +// Complete both tasks in rddA. +assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0) +complete(taskSets(0), Seq( + (Success, makeMapStatus("hostA", 2)), + (Success, makeMapStatus("hostB", 2 + +// The first task of rddB success +assert(taskSets(1).tasks(0).isInstanceOf[ResultTask[_, _]]) +runEvent(makeCompletionEvent( + taskSets(1).tasks(0), Success, makeMapStatus("hostB", 2))) + +// The second task's speculative attempt fails first, but task self still running. +// This may caused by ExecutorLost. +runEvent(makeCompletionEvent( + taskSets(1).tasks(1), + FetchFailed(makeBlockManagerId("hostA"), shuffleIdA, 0, 0, "ignored"), + null)) +// Make sure failedStage is not empty now +assert(scheduler.failedStages.nonEmpty) +// The second result task self success soon. +assert(taskSets(1).tasks(1).isInstanceOf[ResultTask[_, _]]) +runEvent(makeCompletionEvent( + taskSets(1).tasks(1), Success, makeMapStatus("hostB", 2))) +assertDataStructuresEmpty() --- End diff -- Right. It is a check that we are cleaning up the contents of the DAGScheduler's data structures so that they do not grow without bound over time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21168: added check to ensure main method is found [SPARK...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21168#discussion_r184428171 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -675,9 +675,14 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends val userThread = new Thread { override def run() { try { - mainMethod.invoke(null, userArgs.toArray) - finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) - logDebug("Done running users class") + if(mainMethod == null) { --- End diff -- Can this be `null`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2695/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #89891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89891/testReport)** for PR 17086 at commit [`089d64b`](https://github.com/apache/spark/commit/089d64bca76b26f0f90a6dd591fc4218b8ab3700). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21165: Spark 20087: Attach accumulators / metrics to 'TaskKille...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21165 I'm not against the change, but since this changes the semantics of accumulators, we should document the changes in a migration document or something, WDYT @cloud-fan @gatorsmile ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184440128 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason { * Task was killed intentionally and needs to be rescheduled. */ @DeveloperApi -case class TaskKilled(reason: String) extends TaskFailedReason { - override def toErrorString: String = s"TaskKilled ($reason)" +case class TaskKilled( +reason: String, +accumUpdates: Seq[AccumulableInfo] = Seq.empty, +private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil) + extends TaskFailedReason { + + override def toErrorString: String = "TaskKilled ($reason)" --- End diff -- Will do. Didn't notice `s` is missing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184441076 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1417,10 +1417,13 @@ class DAGScheduler( case exceptionFailure: ExceptionFailure => // Nothing left to do, already handled above for accumulator updates. + case _: TaskKilled => --- End diff -- will do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21014 **[Test build #89892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89892/testReport)** for PR 21014 at commit [`2260e15`](https://github.com/apache/spark/commit/2260e155c79a843ade705501209c43b6aee8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 > It should be [Spark-20087] instead of [Spark 20087] in the title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #89894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89894/testReport)** for PR 17086 at commit [`6906dc4`](https://github.com/apache/spark/commit/6906dc4a9fb8d6efde9fb1763c79ad9381fc8dea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21170: [SPARK-22732][SS][FOLLOW-UP] Fix memoryV2.scala toString...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21170 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2698/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org