[GitHub] spark issue #21958: [minor] remove dead code in ExpressionEvalHelper
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21958 Thanks, merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync
Github user ceedubs commented on a diff in the pull request: https://github.com/apache/spark/pull/21971#discussion_r207246356 --- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala --- @@ -61,6 +62,36 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) extends Serializable with Loggi (index, data) => results(index) = data, results.flatten.toSeq) } + + /** + * Returns a future of an aggregation across the RDD. + * + * @see [[RDD.aggregate]] which is the synchronous version of this method. + */ + def aggregateAsync[U](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): FutureAction[U] = +self.withScope { --- End diff -- In the synchronous version of `aggregate`, the `zeroValue` is cloned, which requires adding an implicit `ClassTag[U]` argument. I didn't really understand the motivation for that, so I didn't do it here, but I was hoping that someone who understood the cloning could let me know here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21971 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21971 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21722: Spark-24742: Fix NullPointerexception in Field Metadata
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21722 Heh, collided yeah. I thought one commit would end up failing but since they're identical guess it results in an empty commit. No big deal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21236#discussion_r207247448 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -98,6 +98,9 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { if (expected.isNaN) result.isNaN else expected == result case (result: Float, expected: Float) => if (expected.isNaN) result.isNaN else expected == result + case (result: UnsafeRow, expected: GenericInternalRow) => --- End diff -- Roger that, looks like Wenchen just did so. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21971 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21632 **[Test build #94015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94015/testReport)** for PR 21632 at commit [`981d707`](https://github.com/apache/spark/commit/981d7072c4574184342868616c69bd44bc33ce3b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21955 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21955 **[Test build #94012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94012/testReport)** for PR 21955 at commit [`efea0a8`](https://github.com/apache/spark/commit/efea0a889e0ff9ee226f2bd94c58817d9c96d812). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21955 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94012/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21958: [minor] remove dead code in ExpressionEvalHelper
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21958 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21944: [SPARK-24988][SQL]Add a castBySchema method which...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21944#discussion_r207248446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1367,6 +1367,22 @@ class Dataset[T] private[sql]( }: _*) } + /** + * Casts all the values of the current Dataset following the types of a specific StructType. + * This method works also with nested structTypes. + * + * @group typedrel + * @since 2.4.0 + */ + def castBySchema(schema: StructType): DataFrame = { + assert(schema.fields.map(_.name).toList.sameElements(this.schema.fields.map(_.name).toList), + "schema should have the same fields as the original schema") + +selectExpr(schema.map( --- End diff -- There are many good one liner tricks and I would just leave those good tricks in mailing list or something. I wouldn't add an API only because it _might be_ helpful to some users. We shouldn't add an API only because it _might be_ useful. I would consider adding this if there's a request for this PR multiple times, it is not one liner change and there's no easy workaround for it. Otherwise, every system will have an API to send an email. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21632 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21632 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1669/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21927#discussion_r207249569 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -340,6 +340,22 @@ class DAGScheduler( } } + /** + * Check to make sure we don't launch a barrier stage with unsupported RDD chain pattern. The + * following patterns are not supported: + * 1. Ancestor RDDs that have different number of partitions from the resulting RDD (eg. + * union()/coalesce()/first()/PartitionPruningRDD); --- End diff -- OK I see that it'll be a different number of partitions, but conceptually it should be OK, right? the user just wants all tasks launched together, even if its a different number of tasks than the number of partitions in the original barrier rdd. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21927#discussion_r207249848 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -340,6 +340,22 @@ class DAGScheduler( } } + /** + * Check to make sure we don't launch a barrier stage with unsupported RDD chain pattern. The + * following patterns are not supported: + * 1. Ancestor RDDs that have different number of partitions from the resulting RDD (eg. + * union()/coalesce()/first()/PartitionPruningRDD); --- End diff -- but anyway, I guess its also fine to not support this case, I was just trying to understand myself. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21944: [SPARK-24988][SQL]Add a castBySchema method which...
Github user mahmoudmahdi24 commented on a diff in the pull request: https://github.com/apache/spark/pull/21944#discussion_r207250060 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1367,6 +1367,22 @@ class Dataset[T] private[sql]( }: _*) } + /** + * Casts all the values of the current Dataset following the types of a specific StructType. + * This method works also with nested structTypes. + * + * @group typedrel + * @since 2.4.0 + */ + def castBySchema(schema: StructType): DataFrame = { + assert(schema.fields.map(_.name).toList.sameElements(this.schema.fields.map(_.name).toList), + "schema should have the same fields as the original schema") + +selectExpr(schema.map( --- End diff -- Ok I understand, Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21754 **[Test build #93993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93993/testReport)** for PR 21754 at commit [`7f98b88`](https://github.com/apache/spark/commit/7f98b885b3c6b8675790c4ba7bc79eef0958448d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21754 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93993/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21754 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21895 **[Test build #93991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93991/testReport)** for PR 21895 at commit [`c620fff`](https://github.com/apache/spark/commit/c620fff90d20ba1b62e1277317754d5f14567f79). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93991/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21944: [SPARK-24988][SQL]Add a castBySchema method which...
Github user mahmoudmahdi24 closed the pull request at: https://github.com/apache/spark/pull/21944 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21944: [SPARK-24988][SQL]Add a castBySchema method which casts ...
Github user mahmoudmahdi24 commented on the issue: https://github.com/apache/spark/pull/21944 Closed the PR. This might be a useful trick, but we want to avoid adding many methods to the API. We'll reopen this in case many users asks for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 Minimum and maximum offset in the sink wouldn't make sense for most sources. There aren't any meaningful values to report for e.g. writing out Parquet files. It'd make sense to put them inside just the Kafka WriterCommitMessage, but then I don't think that requires API support. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21953 We have seen jobs overloading the first disk returned by Yarn. Unfortunately the details of the job have long expired. Its in general a good practice to distribute the load anyway. I remember one of the jobs was python. I think it was the case if you look in like EvalPythonExec.scala: // The queue used to buffer input rows so we can drain it to // combine input with output from Python. val queue = HybridRowQueue(context.taskMemoryManager(), new File(Utils.getLocalDir(SparkEnv.get.conf)), child.output.length) That is always going to hit the disk yarn returns first for every container on that node. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21953 Jenkins, test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21930: [SPARK-14540][Core] Fix remaining major issues for Scala...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21930 @srowen thanks! So 2.12 will be optional for Spark 2.4? And the major version for Spark 3.0? What is the plan? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21953 **[Test build #94016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94016/testReport)** for PR 21953 at commit [`3986e75`](https://github.com/apache/spark/commit/3986e75c3c000e7a7e7674be6837d663499f35f1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21954 **[Test build #94002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94002/testReport)** for PR 21954 at commit [`c3bf6a0`](https://github.com/apache/spark/commit/c3bf6a0059a151ba23cf32c842e31ced3b28726c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ResolveHigherOrderFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan] ` * ` s\"its class is $` * `case class ResolveLambdaVariables(conf: SQLConf) extends Rule[LogicalPlan] ` * `case class NamedLambdaVariable(` * `case class LambdaFunction(` * `trait HigherOrderFunction extends Expression ` * `trait ArrayBasedHigherOrderFunction extends HigherOrderFunction with ExpectsInputTypes ` * `case class ArrayTransform(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21930: [SPARK-14540][Core] Fix remaining major issues for Scala...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21930 Yes we need to also create a 2.12 build of Spark in 2.4. We might still have to label it "beta" as I still kind of suspect there's a corner case lurking here. I can't speak for 3.0, but would assume it would try to support 2.13, and not support 2.11. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21954 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94002/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21954 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21935 **[Test build #94004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94004/testReport)** for PR 21935 at commit [`fed8505`](https://github.com/apache/spark/commit/fed850598ff4c52ff3c6cd54f2d3d719b8a745e7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21953: [SPARK-24992][Core] spark should randomize yarn l...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21953#discussion_r207258035 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -460,7 +461,14 @@ private[spark] object Utils extends Logging { if (useCache && fetchCacheEnabled) { val cachedFileName = s"${url.hashCode}${timestamp}_cache" val lockFileName = s"${url.hashCode}${timestamp}_lock" - val localDir = new File(getLocalDir(conf)) + var localDir: File = null + // Set the cachedLocalDir for the first time and re-use it later + this.synchronized { --- End diff -- if we want to be more efficient to not hit the synchronized block each time we could do one extra check before it to check cachedLocalDir.isEmpty. Only if its empty do we enter synchronized and then recheck if still empty. this would be very similar to getOrCreateLocalRootDirs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21935 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94004/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21935 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r207258477 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand( s"partitioned, but a partition spec was provided.") } } - -val loadPath = +val loadPath = { if (isLocal) { -val uri = Utils.resolveURI(path) -val file = new File(uri.getPath) -val exists = if (file.getAbsolutePath.contains("*")) { - val fileSystem = FileSystems.getDefault - val dir = file.getParentFile.getAbsolutePath - if (dir.contains("*")) { -throw new AnalysisException( - s"LOAD DATA input path allows only filename wildcard: $path") - } - - // Note that special characters such as "*" on Windows are not allowed as a path. - // Calling `WindowsFileSystem.getPath` throws an exception if there are in the path. - val dirPath = fileSystem.getPath(dir) - val pathPattern = new File(dirPath.toAbsolutePath.toString, file.getName).toURI.getPath - val safePathPattern = if (Utils.isWindows) { -// On Windows, the pattern should not start with slashes for absolute file paths. -pathPattern.stripPrefix("/") - } else { -pathPattern - } - val files = new File(dir).listFiles() - if (files == null) { -false - } else { -val matcher = fileSystem.getPathMatcher("glob:" + safePathPattern) -files.exists(f => matcher.matches(fileSystem.getPath(f.getAbsolutePath))) - } -} else { - new File(file.getAbsolutePath).exists() -} -if (!exists) { - throw new AnalysisException(s"LOAD DATA input path does not exist: $path") -} -uri +val localFS = FileContext.getLocalFSFileContext() +localFS.makeQualified(new Path(path)) } else { -val uri = new URI(path) -val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != null) { - uri -} else { - // Follow Hive's behavior: - // If no schema or authority is provided with non-local inpath, - // we will use hadoop configuration "fs.defaultFS". - val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") - val defaultFS = if (defaultFSConf == null) { -new URI("") - } else { -new URI(defaultFSConf) - } - - val scheme = if (uri.getScheme() != null) { -uri.getScheme() - } else { -defaultFS.getScheme() - } - val authority = if (uri.getAuthority() != null) { -uri.getAuthority() - } else { -defaultFS.getAuthority() - } - - if (scheme == null) { -throw new AnalysisException( - s"LOAD DATA: URI scheme is required for non-local input paths: '$path'") - } - - // Follow Hive's behavior: - // If LOCAL is not specified, and the path is relative, - // then the path is interpreted relative to "/user/" - val uriPath = uri.getPath() - val absolutePath = if (uriPath != null && uriPath.startsWith("/")) { -uriPath - } else { -s"/user/${System.getProperty("user.name")}/$uriPath" - } - new URI(scheme, authority, absolutePath, uri.getQuery(), uri.getFragment()) -} -val hadoopConf = sparkSession.sessionState.newHadoopConf() -val srcPath = new Path(hdfsUri) -val fs = srcPath.getFileSystem(hadoopConf) -if (!fs.exists(srcPath)) { - throw new AnalysisException(s"LOAD DATA input path does not exist: $path") -} -hdfsUri +val loadPath = new Path(path) +// Follow Hive's behavior: +// If no schema or authority is provided with non-local inpath, +// we will use hadoop configuration "fs.defaultFS". +val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") +val defaultFS = if (defaultFSConf == null) new URI("") else new URI(defaultFSConf) +// Follow Hive's behavior: +// If LOCAL is not specified, and the path is relative, +// then the path is interpreted relative to "/user/" +val uriPath = new Path(s"/user/${System.getProperty
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21895 retest this pelase --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/21754 This LGTM as a fix. However, ideally we should also support reusing an exchange used in different joins. There is no need to shuffle write the same table twice, we just need read it differently. For example in one stage, a reducer may read partition 0 to 2, while in another stage a reducer may read partition 0 to 3. We just need a different partitionStartIndices to form a different ShuffledRowRDD, then we can reuse the Exchange. I should have addressed this in my new implementation of adaptive execution, @cloud-fan, let's pay attention to it when we reviewing that pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...
Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/21950 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @jose-torres why it wouldnt make sense? According to the documentation all SS sources have offsets, but not all sinks can also be SS sources e.g. ForEach doesnt have offsets in general. So usually the offsets should be available on the Sinks, no? Your expert feedback on this is very appreciated! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable
Github user vincent-grosbois commented on the issue: https://github.com/apache/spark/pull/21933 Hello, I updated the description and title --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21950 **[Test build #94017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94017/testReport)** for PR 21950 at commit [`aa2a957`](https://github.com/apache/spark/commit/aa2a957751a906fe538822cace019014e763a8c3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21403 **[Test build #94001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94001/testReport)** for PR 21403 at commit [`53e3d96`](https://github.com/apache/spark/commit/53e3d961a0cde6d6ab6b4c8b86b9134b9532f776). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94001/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTask...
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/21972 [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext with BarrierTaskContextImpl ## What changes were proposed in this pull request? According to https://github.com/apache/spark/pull/21758#discussion_r206746905 , current declaration of `BarrierTaskContext` didn't extend methods from `TaskContext`. Since `TaskContext` is an abstract class and we don't want to change it to a trait, we have to define class `BarrierTaskContext` directly. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark BarrierTaskContext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21972.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21972 commit e5987cf281136528ec0d23f82fe1505abd6545b3 Author: Xingbo Jiang Date: 2018-08-02T15:09:37Z combine BarrierTaskContext with BarrierTaskContextImpl. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21972 **[Test build #94018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94018/testReport)** for PR 21972 at commit [`e5987cf`](https://github.com/apache/spark/commit/e5987cf281136528ec0d23f82fe1505abd6545b3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21972 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21972 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1670/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21936: [SPARK-24981][Core] ShutdownHook timeout causes j...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21936#discussion_r207269915 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -571,7 +571,12 @@ class SparkContext(config: SparkConf) extends Logging { _shutdownHookRef = ShutdownHookManager.addShutdownHook( ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () => logInfo("Invoking stop() from shutdown hook") - stop() + try { +stop() + } catch { +case e: Throwable => + logWarning("Ignoring Exception while stopping SparkContext", e) --- End diff -- minor nit, could you add in "while stopping SparkContext from shutdownhook" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 For file streams, the offsets are just indices into a log the source keeps of which files it's seen. So a file sink doesn't have any access to those offsets. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21403 **[Test build #93999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93999/testReport)** for PR 21403 at commit [`0f00a06`](https://github.com/apache/spark/commit/0f00a06a1853cb13d1d156bafcb85973c92e2b8e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93999/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21954 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21954 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21954 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1671/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21959: [SPARK-23698] Define xrange() for Python 3 in dum...
Github user cclauss closed the pull request at: https://github.com/apache/spark/pull/21959 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21954 **[Test build #94019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94019/testReport)** for PR 21954 at commit [`c3bf6a0`](https://github.com/apache/spark/commit/c3bf6a0059a151ba23cf32c842e31ced3b28726c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21960: [SPARK-23698] Remove unused definitions of long a...
Github user cclauss closed the pull request at: https://github.com/apache/spark/pull/21960 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...
Github user zecevicp commented on the issue: https://github.com/apache/spark/pull/21109 Implementing spilling over seems a lot of work because this is a queue. If data is spilled over to disk and you need to pop from the queue, it is not clear to me what is the best way to do that. Do you spill over only one part of the queue (so that you can add or pop more efficiently)? Which part (the beginning or the end)? Or maybe the middle? What is the threshold to bring it back to memory from disk? And other similar questions... But I think it can be expected that much less memory will be consumed by the queue, compared to the original `ExternalAppendOnlyUnsafeRowArray`, because the queue's purpose IS to reduce the number of rows in memory, so spill-over would rarely be needed (that would depend, of course, to the user's range condition). That's why implementing spilling over doesn't seem critical to me. I can try and implement it, if everybody thinks it's really needed, but as I said, it's not clear (to me) what would be the best approach. Regarding the second point, this is not an ordinary range join, but an equi-join with a secondary range condition. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16486: [SPARK-13610][ML] Create a Transformer to disassemble ve...
Github user AlbertPlaPlanas commented on the issue: https://github.com/apache/spark/pull/16486 Was this ever implemented? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21935 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21935 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1672/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21972 **[Test build #94018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94018/testReport)** for PR 21972 at commit [`e5987cf`](https://github.com/apache/spark/commit/e5987cf281136528ec0d23f82fe1505abd6545b3). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21972 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94018/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21972 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21935 **[Test build #94020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94020/testReport)** for PR 21935 at commit [`fed8505`](https://github.com/apache/spark/commit/fed850598ff4c52ff3c6cd54f2d3d719b8a745e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 These test failures are in Spark streaming. Is this just an intermittent test failure or actually caused by this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21936: [SPARK-24981][Core] ShutdownHook timeout causes job to f...
Github user hthuynh2 commented on the issue: https://github.com/apache/spark/pull/21936 @tgravescs I updated. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21936: [SPARK-24981][Core] ShutdownHook timeout causes job to f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21936 **[Test build #94021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94021/testReport)** for PR 21936 at commit [`a328163`](https://github.com/apache/spark/commit/a328163c97c9328a85e6415a716c130de9892b16). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21754 **[Test build #94000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94000/testReport)** for PR 21754 at commit [`5dfd948`](https://github.com/apache/spark/commit/5dfd94843ff776e75e0c0fb5198f36bfebf94288). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 Yes, I was hoping to improve that eg using filename as offset or other non consumer-owned approach, but that would be rather long term. Do you think it is solvable? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21754 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...
Github user hthuynh2 commented on the issue: https://github.com/apache/spark/pull/21953 @tgravescs I updated it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21754 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94000/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spar...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r207281949 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala --- @@ -107,7 +109,14 @@ private[spark] class Client( def run(): Unit = { val resolvedDriverSpec = builder.buildFromFeatures(kubernetesConf) val configMapName = s"$kubernetesResourceNamePrefix-driver-conf-map" -val configMap = buildConfigMap(configMapName, resolvedDriverSpec.systemProperties) +val isKerberosEnabled = kubernetesConf.getTokenManager.isSecurityEnabled +// HADOOP_SECURITY_AUTHENTICATION is defined as simple for the driver and executors as +// they need only the delegation token to access secure HDFS, no need to sign in to Kerberos +val maybeSimpleAuthentication = + if (isKerberosEnabled) Some((s"-D$HADOOP_SECURITY_AUTHENTICATION", "simple")) else None --- End diff -- I agree that the uses cases presented above require Kerberos login on the driver and executors. I will address these concerns in my followup commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21754 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api
Github user squito commented on the issue: https://github.com/apache/spark/pull/21923 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21970: [SPARK-24996][SQL] Use DSL in DeclarativeAggregate
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21970 **[Test build #94013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94013/testReport)** for PR 21970 at commit [`6273831`](https://github.com/apache/spark/commit/6273831d38069731fdd689c03ce078e6158db2a4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21970: [SPARK-24996][SQL] Use DSL in DeclarativeAggregate
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21970 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94013/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21970: [SPARK-24996][SQL] Use DSL in DeclarativeAggregate
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21970 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21972 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21972 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1673/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21923 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1674/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21923 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21754 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1675/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21754 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21972 **[Test build #94022 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94022/testReport)** for PR 21972 at commit [`f3ea13d`](https://github.com/apache/spark/commit/f3ea13d68736cf445d2d72f66cbb2d082a7853bc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21923 **[Test build #94023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94023/testReport)** for PR 21923 at commit [`ba6aa6c`](https://github.com/apache/spark/commit/ba6aa6c829bfcca1b4b3d5a33fe3a7460e7db1f0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21754 **[Test build #94024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94024/testReport)** for PR 21754 at commit [`5dfd948`](https://github.com/apache/spark/commit/5dfd94843ff776e75e0c0fb5198f36bfebf94288). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21305 @cloud-fan, I'll fix the conflicts and re-run tests. Yesterday's tests passed after I updated for your feedback. I'd like to try to get this in soon because it is taking so much time to resolve conflicts without any real changes. FYI @gatorsmile, @bersprockets, @jzhuge --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21941 **[Test build #94003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94003/testReport)** for PR 21941 at commit [`e7d69db`](https://github.com/apache/spark/commit/e7d69db7cd0c23d6ee9012b5f48b17e5aeac8d66). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21930: [SPARK-14540][Core] Fix remaining major issues for Scala...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21930 Sure @lrytz can have a second look on this, also it needs to be battle tested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21911: [SPARK-24940][SQL] Coalesce and Repartition Hint ...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21911#discussion_r207285266 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala --- @@ -102,6 +104,35 @@ object ResolveHints { } } + /** + * COALESCE Hint accepts name "COALESCE" and "REPARTITION". + * Its parameter includes a partition number. + */ + class ResolveCoalesceHints(conf: SQLConf) extends Rule[LogicalPlan] { +private val COALESCE_HINT_NAMES = Set("COALESCE", "REPARTITION") + +private def applyCoalesceHint( + plan: LogicalPlan, + numPartitions: Int, + shuffle: Boolean): LogicalPlan = { + Repartition(numPartitions, shuffle, plan) +} + +def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { + case h: UnresolvedHint if COALESCE_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) => +h.parameters match { + case Seq(Literal(numPartitions: Int, IntegerType)) => +val shuffle = h.name.toUpperCase(Locale.ROOT) match { + case "REPARTITION" => true + case "COALESCE" => false +} +applyCoalesceHint(h.child, numPartitions, shuffle) + case _ => +throw new AnalysisException("COALESCE Hint expects a partition number as parameter") --- End diff -- Can you use `h.name.toUpperCase` in this error message instead? I think that would be a better message for users that don't know the relationship between COALESCE and REPARTITION. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21941 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org