[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21165 I think we can just update MimaExcludes, since it's developer API. cc @JoshRosen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21114: [SPARK-22371][CORE] Return None instead of throwing an e...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21114 can we do this? ``` var acc = ... ... // launch a long running job val accId = acc.getId acc = null gc ... // job finished ``` accumulator is created by users so we have to be prepared for any situations. That's why we use weak reference at the first place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21322 I think users are responsible to call `Broadcast#destroy`, which unpersist broadcast blocks from block manager and run user-defined driver side cleanup. It is a valid use case to allow users to define some executor side cleanup via `AutoCloseable`. However, I don't think we should always detect `AutoCloseable` when removing a block, as it may break existing program and cause perf regression. We should only do it for broadcast blocks. A good place to do it seems to be `BlockManager.removeBroadcast` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/19840 @vanzin I am not very familiar with python part [context.py#L191](https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191), so handle it at `api/python/PythonRunner` as I did in this pr. Maybe someone else could help, sorry for the delay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21267 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3214/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21267 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21267 **[Test build #90616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90616/testReport)** for PR 21267 at commit [`ef3555e`](https://github.com/apache/spark/commit/ef3555e389ea36159e9a1dfd076e9f6afbaf3f35). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21267 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21267 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3213/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work with Py...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21267 **[Test build #90614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90614/testReport)** for PR 21267 at commit [`b9e312e`](https://github.com/apache/spark/commit/b9e312ecfd0215c669e1826e891ccbaa5937ea49). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20800 **[Test build #90615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90615/testReport)** for PR 20800 at commit [`f30d3ec`](https://github.com/apache/spark/commit/f30d3ec95c0d00f409f6536d10710b2f65fad787). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21267#discussion_r188144573 --- Diff: python/pyspark/context.py --- @@ -211,9 +211,22 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, for path in self._conf.get("spark.submit.pyFiles", "").split(","): if path != "": (dirname, filename) = os.path.split(path) -if filename[-4:].lower() in self.PACKAGE_EXTENSIONS: -self._python_includes.append(filename) -sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename)) +try: +filepath = os.path.join(SparkFiles.getRootDirectory(), filename) +if not os.path.exists(filepath): +# In case of YARN with shell mode, 'spark.submit.pyFiles' files are +# not added via SparkContext.addFile. Here we check if the file exists, +# try to copy and then add it to the path. See SPARK-21945. +shutil.copyfile(path, filepath) +if filename[-4:].lower() in self.PACKAGE_EXTENSIONS: +self._python_includes.append(filename) +sys.path.insert(1, filepath) +except Exception: +from pyspark import util +warnings.warn( --- End diff -- Likewise, I checked the warning manually: ``` .../pyspark/context.py:229: RuntimeWarning: Failed to add file [/home/spark/tmp.py] speficied in 'spark.submit.pyFiles' to Python path: ... /usr/lib64/python27.zip /usr/lib64/python2.7 ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20800 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in Dataset
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20800 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...
Github user sohama4 commented on a diff in the pull request: https://github.com/apache/spark/pull/21316#discussion_r188143976 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1607,7 +1607,9 @@ class Dataset[T] private[sql]( */ @Experimental @InterfaceStability.Evolving - def reduce(func: (T, T) => T): T = rdd.reduce(func) + def reduce(func: (T, T) => T): T = withNewExecutionId { --- End diff -- Thanks, that makes sense when I looked at the code for `foreach` and `foreachPartition`; I put up a new version with this change. It however wasn't clear immediately how the new function `withNewRDDExecutionId` would be beneficial over `withNewExecutionId`, can you elaborate a little when you get the chance? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21028#discussion_r188143390 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -529,6 +567,239 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } +/** + * Checks if the two arrays contain at least one common element. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5)); + true + """, since = "2.4.0") +// scalastyle:off line.size.limit +case class ArraysOverlap(left: Expression, right: Expression) + extends BinaryArrayExpressionWithImplicitCast { + + override def checkInputDataTypes(): TypeCheckResult = super.checkInputDataTypes() match { +case TypeCheckResult.TypeCheckSuccess => + if (RowOrdering.isOrderable(elementType)) { +TypeCheckResult.TypeCheckSuccess + } else { +TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} cannot be used in comparison.") + } +case failure => failure + } + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient private lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } + + @transient private lazy val doEvaluation = if (elementTypeSupportEquals) { +fastEval _ + } else { +bruteForceEval _ + } + + override def dataType: DataType = BooleanType + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull || + right.dataType.asInstanceOf[ArrayType].containsNull + } + + override def nullSafeEval(a1: Any, a2: Any): Any = { +doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData]) + } + + /** + * A fast implementation which puts all the elements from the smaller array in a set + * and then performs a lookup on it for each element of the bigger one. + * This eval mode works only for data types which implements properly the equals method. + */ + private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = { +var hasNull = false +val (bigger, smaller) = if (arr1.numElements() > arr2.numElements()) { + (arr1, arr2) +} else { + (arr2, arr1) +} +if (smaller.numElements() > 0) { + val smallestSet = new mutable.HashSet[Any] + smaller.foreach(elementType, (_, v) => +if (v == null) { + hasNull = true +} else { + smallestSet += v +}) + bigger.foreach(elementType, (_, v1) => +if (v1 == null) { + hasNull = true +} else if (smallestSet.contains(v1)) { + return true +} + ) +} +if (hasNull) { + null +} else { + false +} + } + + /** + * A slower evaluation which performs a nested loop and supports all the data types. + */ + private def bruteForceEval(arr1: ArrayData, arr2: ArrayData): Any = { +var hasNull = false +if (arr1.numElements() > 0) { + arr1.foreach(elementType, (_, v1) => +if (v1 == null) { + hasNull = true +} else { + arr2.foreach(elementType, (_, v2) => +if (v1 == null) { + hasNull = true +} else if (ordering.equiv(v1, v2)) { + return true +} + ) +}) +} +if (hasNull) { + null +} else { + false +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (a1, a2) => { + val smaller = ctx.freshName("smallerArray") + val bigger = ctx.freshName("biggerArray") + val comparisonCode = if (elementTypeSupportEquals) { +fastCodegen(ctx, ev, smaller, bigger) + } else { +bruteForceCodegen(ctx, ev, smaller, bigger) + } + s""" + |ArrayData $smaller; + |ArrayData $bigger; + |if ($a1.numElements() > $a2.numElements()) { +
[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21183 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90611/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21183 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21183 **[Test build #90611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90611/testReport)** for PR 21183 at commit [`7ee0ebf`](https://github.com/apache/spark/commit/7ee0ebf028e41719514c0588d378cb515aea744a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21323 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90608/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21323 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21323 **[Test build #90608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90608/testReport)** for PR 21323 at commit [`56437da`](https://github.com/apache/spark/commit/56437da708fc12d2c9216a1365a8afd6f81af845). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #90613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90613/testReport)** for PR 21221 at commit [`10ed328`](https://github.com/apache/spark/commit/10ed328bfcf160711e7619aac23472f97bf1c976). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21153 **[Test build #90612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90612/testReport)** for PR 21153 at commit [`dc59375`](https://github.com/apache/spark/commit/dc593754c62d2daf89331ea21d9250af9b9febfd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21153 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90612/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r188136553 --- Diff: core/src/main/scala/org/apache/spark/Heartbeater.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.util.concurrent.TimeUnit + +import org.apache.spark.util.{ThreadUtils, Utils} + +/** + * Creates a heartbeat thread which will call the specified reportHeartbeat function at + * intervals of intervalMs. + * + * @param reportHeartbeat the heartbeat reporting function to call. + * @param intervalMs the interval between heartbeats. + */ +private[spark] class Heartbeater(reportHeartbeat: () => Unit, intervalMs: Long) { + // Executor for the heartbeat task + private val heartbeater = ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-heartbeater") --- End diff -- Changed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r188136532 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1753,9 +1766,21 @@ class DAGScheduler( messageScheduler.shutdownNow() eventProcessLoop.stop() taskScheduler.stop() +heartbeater.stop() + } + + /** Reports heartbeat metrics for the driver. */ + private def reportHeartBeat(): Unit = { --- End diff -- It's a bit redundant for fields that aren't used by the driver -- for the driver, execution memory gets set to 0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21312 It looks like the `ListVector` also needs `setLastSet` to be called with 0, which is only in `ListVector`. This is fine though, since `ListVector` is the only vector extending `BaseRepeatedValueVector` ``` case listVector: ListVector => val buffers = listVector.getBuffers(false) buffers.foreach(buf => buf.setByte(0, buf.capacity())) listVector.setValueCount(0) listVector.setLastSet(0) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21153 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3212/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 Ok. I will use manual reset for now and leave a TODO comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21312 I'm okay with either way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21291 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3211/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21291 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...
Github user ludatabricks commented on the issue: https://github.com/apache/spark/pull/21183 I tested to load the old saving models from Spark 2.3. It is ok to load it from this. For the tests in LDASuite, I do see failing sometimes without this fix. It will not always happen. I can remove it if you think it is not necessary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21183 **[Test build #90611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90611/testReport)** for PR 21183 at commit [`7ee0ebf`](https://github.com/apache/spark/commit/7ee0ebf028e41719514c0588d378cb515aea744a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21153 **[Test build #90612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90612/testReport)** for PR 21153 at commit [`dc59375`](https://github.com/apache/spark/commit/dc593754c62d2daf89331ea21d9250af9b9febfd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21291 **[Test build #90610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90610/testReport)** for PR 21291 at commit [`f93738b`](https://github.com/apache/spark/commit/f93738be3a7509d70568b3060a0cc4dd3ff23da0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r188132284 --- Diff: python/pyspark/ml/util.py --- @@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, extraMetadata=None, paramMap=None): - sparkVersion - uid - paramMap +- defalutParamMap (since 2.4.0) --- End diff -- Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r187129517 --- Diff: python/pyspark/ml/util.py --- @@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, extraMetadata=None, paramMap=None): - sparkVersion - uid - paramMap +- defalutParamMap (since 2.4.0) --- End diff -- typo: default --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21291#discussion_r188131133 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala --- @@ -39,7 +39,9 @@ class ConfigBehaviorSuite extends QueryTest with SharedSQLContext { def computeChiSquareTest(): Double = { val n = 1 // Trigger a sort - val data = spark.range(0, n, 1, 1).sort('id.desc) + // Range has range partitioning in its output now. To have a range shuffle, we + // need to run a repartition first. + val data = spark.range(0, n, 1, 1).repartition(10).sort('id.desc) --- End diff -- This test requires a range shuffle. Previously `range` has unknown output partitioning/ordering, so there is a range shuffle inserted before `sort`. For now `range` has an ordered output, so planner doesn't insert the shuffle we need here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21291#discussion_r188130563 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -621,6 +621,25 @@ class PlannerSuite extends SharedSQLContext { requiredOrdering = Seq(orderingA, orderingB), shouldHaveSort = true) } + + test("SPARK-24242: RangeExec should have correct output ordering and partitioning") { +val df = spark.range(10) +val rangeExec = df.queryExecution.executedPlan.collect { + case r: RangeExec => r +} +val range = df.queryExecution.optimizedPlan.collect { + case r: Range => r +} +assert(rangeExec.head.outputOrdering == range.head.outputOrdering) +assert(rangeExec.head.outputPartitioning == + RangePartitioning(rangeExec.head.outputOrdering, df.rdd.getNumPartitions)) + +val rangeInOnePartition = spark.range(1, 10, 1, 1) +val rangeExecInOnePartition = rangeInOnePartition.queryExecution.executedPlan.collect { + case r: RangeExec => r +} +assert(rangeExecInOnePartition.head.outputPartitioning == SinglePartition) --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21239 **[Test build #90609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90609/testReport)** for PR 21239 at commit [`41577c3`](https://github.com/apache/spark/commit/41577c35a7c59ffcf48225fbc30b0dc3c8cab674). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/21239 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/21239 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21322: [SPARK-24225][CORE] Support closing AutoClosable ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21322#discussion_r188128515 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -384,15 +385,36 @@ private[spark] class MemoryStore( } } + private def maybeReleaseResources(entry: MemoryEntry[_]): Unit = { +entry match { + case SerializedMemoryEntry(buffer, _, _) => buffer.dispose() + case DeserializedMemoryEntry(objs: Array[Any], _, _) => maybeCloseValues(objs) + case _ => +} + } + + private def maybeCloseValues(objs: Array[Any]): Unit = { +objs.foreach { +case closable: AutoCloseable => --- End diff -- indent style: two spaces. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21322: [SPARK-24225][CORE] Support closing AutoClosable ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21322#discussion_r188128177 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -384,15 +385,36 @@ private[spark] class MemoryStore( } } + private def maybeReleaseResources(entry: MemoryEntry[_]): Unit = { +entry match { + case SerializedMemoryEntry(buffer, _, _) => buffer.dispose() + case DeserializedMemoryEntry(objs: Array[Any], _, _) => maybeCloseValues(objs) --- End diff -- As I know, broadcasted variables can be serialized on disk too (`BlockManager.doPutIterator`). In the case, seems `AutoCloseable` broadcasted variables won't hit this release logic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 @BryanCutler I have such thought but wondered if it is good to do that. If you @HyukjinKwon @icexelloss are also agreed on manual reset like this, I'm fine with it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21153 **[Test build #4179 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4179/testReport)** for PR 21153 at commit [`ce84137`](https://github.com/apache/spark/commit/ce841372b76fe3263462b1f51ebfda26e098f8f3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21199 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90603/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21199 **[Test build #90603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90603/testReport)** for PR 21199 at commit [`b3a42f0`](https://github.com/apache/spark/commit/b3a42f08cba85b9bec11aaa3f75de298aa869204). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `case class ContinuousRecordPartitionOffset(partitionId: Int, offset: Int) extends PartitionOffset` * `case class GetRecord(offset: ContinuousRecordPartitionOffset)` * `class ContinuousRecordEndpoint(buckets: Seq[Seq[Any]], lock: Object)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21324: [SPARK-24035][SQL] SQL syntax for Pivot - fix antlr warn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21324 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90607/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21324: [SPARK-24035][SQL] SQL syntax for Pivot - fix antlr warn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21324 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21324: [SPARK-24035][SQL] SQL syntax for Pivot - fix antlr warn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21324 **[Test build #90607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90607/testReport)** for PR 21324 at commit [`ecd3792`](https://github.com/apache/spark/commit/ecd37927ef122a75bf87f1de16d6afc80fd0bf61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19840 @yaooqinn do you plan to update this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21153 **[Test build #4179 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4179/testReport)** for PR 21153 at commit [`ce84137`](https://github.com/apache/spark/commit/ce841372b76fe3263462b1f51ebfda26e098f8f3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21199 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90606/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21199 **[Test build #90606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90606/testReport)** for PR 21199 at commit [`b962c3d`](https://github.com/apache/spark/commit/b962c3dbd1715b2d4fa03e65731e36697cf37ff1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90604/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21208 **[Test build #90604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90604/testReport)** for PR 21208 at commit [`c2ce328`](https://github.com/apache/spark/commit/c2ce328eda03f01b58ef9c52084e671cc6720802). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user pepinoflo commented on the issue: https://github.com/apache/spark/pull/21208 Any idea about the test failure? Test name is `org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.SuiteSelector)`, and error message is `java.lang.reflect.InvocationTargetException: null`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/21316#discussion_r188107132 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1607,7 +1607,9 @@ class Dataset[T] private[sql]( */ @Experimental @InterfaceStability.Evolving - def reduce(func: (T, T) => T): T = rdd.reduce(func) + def reduce(func: (T, T) => T): T = withNewExecutionId { --- End diff -- this method should use `withNewRDDExecutionId` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21323 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21323 Merging to master / 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21323 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21323 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90602/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21323 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3210/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21323 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21323 **[Test build #90602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90602/testReport)** for PR 21323 at commit [`fdcacd8`](https://github.com/apache/spark/commit/fdcacd8868de0aca3d13ae5ca5a9e323f114fab9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withNewExecu...
Github user sohama4 commented on the issue: https://github.com/apache/spark/pull/21316 Thanks for the approval @jaceklaskowski! Can you leave a comment so that Jenkins can get testing underway? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23852][SQL] Add withSQLConf(...) to test case
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21323 **[Test build #90608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90608/testReport)** for PR 21323 at commit [`56437da`](https://github.com/apache/spark/commit/56437da708fc12d2c9216a1365a8afd6f81af845). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21316#discussion_r188104204 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1607,7 +1607,9 @@ class Dataset[T] private[sql]( */ @Experimental @InterfaceStability.Evolving - def reduce(func: (T, T) => T): T = rdd.reduce(func) + def reduce(func: (T, T) => T): T = withNewExecutionId { --- End diff -- Why would we want to deprecate it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21323: [SPARK-23582][SQL] Add withSQLConf(...) to test case
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21323 @henryr looks like the bugs is wrong (should be SPARK-23852). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21322 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90599/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21322 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Github user henryr closed the pull request at: https://github.com/apache/spark/pull/21302 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21322: [SPARK-24225][CORE] Support closing AutoClosable objects...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21322 **[Test build #90599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90599/testReport)** for PR 21322 at commit [`f254f94`](https://github.com/apache/spark/commit/f254f94fdc5e2648d7c1104bf5ec2355de7c6055). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21108: [SPARK-24027][SQL] Support MapType with StringTyp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21108 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21302 Also, please close the PR manually (github doesn't do that for branches). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21302 Merging to 2.3. In the unlikely event of issues, we can address them later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21108 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] Allow non-...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21300 Thanks for your confirmation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] Allow non-...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/21300 this is ok to me since it's turned off by default --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21208 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21208 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90605/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21208 **[Test build #90605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90605/testReport)** for PR 21208 at commit [`703d254`](https://github.com/apache/spark/commit/703d2547cf715419c1d2eafc3d440e4eb0e7132c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21218 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90600/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21108 **[Test build #90600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90600/testReport)** for PR 21108 at commit [`768ef5e`](https://github.com/apache/spark/commit/768ef5ee46973d0f578437e489fc9bc622d77831). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21218: [SPARK-24155][ML] Instrumentation improvements for clust...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/21218 LGTM. Merged into master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21183#discussion_r188081089 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -473,7 +475,8 @@ final class OnlineLDAOptimizer extends LDAOptimizer with Logging { None } -val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] = batch.mapPartitions { docs => +val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] = batch.mapPartitionsWithIndex --- End diff -- fix scala style: ``` val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] = batch.mapPartitionsWithIndex { (index, docs) => val nonEmptyDocs = docs.filter(_._2.numNonzeros > 0) val stat = BDM.zeros[Double](k, vocabSize) val logphatPartOption = logphatPartOptionBase() var nonEmptyDocCount: Long = 0L nonEmptyDocs.foreach { case (_, termCounts: Vector) => nonEmptyDocCount += 1 val (gammad, sstats, ids) = OnlineLDAOptimizer.variationalTopicInference( termCounts, expElogbetaBc.value, alpha, gammaShape, k, seed + index) stat(::, ids) := stat(::, ids) + sstats logphatPartOption.foreach(_ += LDAUtils.dirichletExpectation(gammad)) } Iterator((stat, logphatPartOption, nonEmptyDocCount)) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should...
Github user fangshil commented on the issue: https://github.com/apache/spark/pull/21310 I will investigate how can we add test for this. thoughts are welcomed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21180: [SPARK-22674][PYTHON] Disabled _hack_namedtuple for pick...
Github user superbobry commented on the issue: https://github.com/apache/spark/pull/21180 Hey @HyukjinKwon and @felixcheung, do you think the PR is good to be merged as-is, or would you like me to think further about how to make it more robust? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21291#discussion_r188082738 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala --- @@ -39,7 +39,9 @@ class ConfigBehaviorSuite extends QueryTest with SharedSQLContext { def computeChiSquareTest(): Double = { val n = 1 // Trigger a sort - val data = spark.range(0, n, 1, 1).sort('id.desc) + // Range has range partitioning in its output now. To have a range shuffle, we + // need to run a repartition first. + val data = spark.range(0, n, 1, 1).repartition(10).sort('id.desc) --- End diff -- sorry, I am just curious, why is `sort('id.desc)` not causing a shuffle? Shouldn't it be ordered by `'id.asc` without the sort? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21291#discussion_r188081824 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -621,6 +621,25 @@ class PlannerSuite extends SharedSQLContext { requiredOrdering = Seq(orderingA, orderingB), shouldHaveSort = true) } + + test("SPARK-24242: RangeExec should have correct output ordering and partitioning") { +val df = spark.range(10) +val rangeExec = df.queryExecution.executedPlan.collect { + case r: RangeExec => r +} +val range = df.queryExecution.optimizedPlan.collect { + case r: Range => r +} +assert(rangeExec.head.outputOrdering == range.head.outputOrdering) +assert(rangeExec.head.outputPartitioning == + RangePartitioning(rangeExec.head.outputOrdering, df.rdd.getNumPartitions)) + +val rangeInOnePartition = spark.range(1, 10, 1, 1) +val rangeExecInOnePartition = rangeInOnePartition.queryExecution.executedPlan.collect { + case r: RangeExec => r +} +assert(rangeExecInOnePartition.head.outputPartitioning == SinglePartition) --- End diff -- should we also add a test case for the 0 partition case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org