[GitHub] spark issue #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to prevent ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19528 **[Test build #86410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86410/consoleFull)** for PR 19528 at commit [`76ad8c5`](https://github.com/apache/spark/commit/76ad8c5e62a7233c16399043716139b52ee1c97d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to prevent ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19528 Jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r162776148 --- Diff: sql/core/src/test/resources/sql-tests/results/operators.sql.out --- @@ -233,7 +233,7 @@ struct -- !query 28 output == Physical Plan == *Project [null AS (CAST(concat(a, CAST(1 AS STRING)) AS DOUBLE) + CAST(2 AS DOUBLE))#x] -+- Scan OneRowRelation[] ++- Scan Scan RDD OneRowRelation [][] --- End diff -- ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20087#discussion_r162776118 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala --- @@ -0,0 +1,354 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.fs.Path +import org.apache.orc.OrcConf.COMPRESS +import org.apache.parquet.hadoop.ParquetOutputFormat +import org.scalatest.BeforeAndAfterAll + +import org.apache.spark.sql.execution.datasources.orc.OrcOptions +import org.apache.spark.sql.execution.datasources.parquet.{ParquetOptions, ParquetTest} +import org.apache.spark.sql.hive.orc.OrcFileOperator +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf + +class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with BeforeAndAfterAll { + import spark.implicits._ + + override def beforeAll(): Unit = { +super.beforeAll() +(0 until maxRecordNum).toDF("a").createOrReplaceTempView("table_source") + } + + override def afterAll(): Unit = { +try { + spark.catalog.dropTempView("table_source") +} finally { + super.afterAll() +} + } + + private val maxRecordNum = 50 + + private def getConvertMetastoreConfName(format: String): String = format.toLowerCase match { +case "parquet" => HiveUtils.CONVERT_METASTORE_PARQUET.key +case "orc" => HiveUtils.CONVERT_METASTORE_ORC.key + } + + private def getSparkCompressionConfName(format: String): String = format.toLowerCase match { +case "parquet" => SQLConf.PARQUET_COMPRESSION.key +case "orc" => SQLConf.ORC_COMPRESSION.key + } + + private def getHiveCompressPropName(format: String): String = format.toLowerCase match { +case "parquet" => ParquetOutputFormat.COMPRESSION +case "orc" => COMPRESS.getAttribute + } + + private def normalizeCodecName(format: String, name: String): String = { +format.toLowerCase match { + case "parquet" => ParquetOptions.getParquetCompressionCodecName(name) + case "orc" => OrcOptions.getORCCompressionCodecName(name) +} + } + + private def getTableCompressionCodec(path: String, format: String): Seq[String] = { +val hadoopConf = spark.sessionState.newHadoopConf() +val codecs = format.toLowerCase match { + case "parquet" => for { +footer <- readAllFootersWithoutSummaryFiles(new Path(path), hadoopConf) +block <- footer.getParquetMetadata.getBlocks.asScala +column <- block.getColumns.asScala + } yield column.getCodec.name() + case "orc" => new File(path).listFiles().filter { file => +file.isFile && !file.getName.endsWith(".crc") && file.getName != "_SUCCESS" + }.map { orcFile => + OrcFileOperator.getFileReader(orcFile.toPath.toString).get.getCompression.toString + }.toSeq +} +codecs.distinct + } + + private def createTable( + rootDir: File, + tableName: String, + isPartitioned: Boolean, + format: String, + compressionCodec: Option[String]): Unit = { +val tblProperties = compressionCodec match { + case Some(prop) => s"TBLPROPERTIES('${getHiveCompressPropName(format)}'='$prop')" + case _ => "" +} +val partitionCreate = if (isPartitioned) "PARTITIONED BY (p string)" else "" +sql( + s""" +|CREATE TABLE $tableName(a int) +|$partitionCreate +|STORED AS $format +|LOCATION '${rootDir.toURI.toString.stripSuffix("/")}/$tableName' +|$tblProperties + """.stripMargin) + } + + private def writeDataToTable( + tableName: String, +
[GitHub] spark pull request #20324: [SPARK-23091][ML] Incorrect unit test for approxQ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20324 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20324 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20324 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20330 cc @gengliangwang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19993 OK, sent: https://github.com/mgaido91/spark/pull/1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20287: [SPARK-23121][WEB-UI] When the Spark Streaming app is ru...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/20287 @smurakozi @vanzin @srowen Thanks, i will close the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20087 **[Test build #86409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86409/testReport)** for PR 20087 at commit [`5b5e1df`](https://github.com/apache/spark/commit/5b5e1df983af6ff03ec6ef6c83208c8b25af93e2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4068/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19054 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86405/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86406/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20297 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19054 **[Test build #86406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86406/testReport)** for PR 19054 at commit [`00bb14b`](https://github.com/apache/spark/commit/00bb14b0145a2bd42c8b4c8a9d4f108322804f71). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #86405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86405/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20226 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86407/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20226 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20226 **[Test build #86407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86407/testReport)** for PR 20226 at commit [`bf90ac7`](https://github.com/apache/spark/commit/bf90ac713f1ea909572486b136c44f9e4badc50c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20203 **[Test build #86408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86408/testReport)** for PR 20203 at commit [`f388c45`](https://github.com/apache/spark/commit/f388c45ee56c17f48d393240f29901f73865bb74). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20330 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86404/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20330 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20330 **[Test build #86404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86404/testReport)** for PR 20330 at commit [`f19d3a1`](https://github.com/apache/spark/commit/f19d3a1dce67cb8af682c1de9bd41411be1d8b0d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86403/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20226 **[Test build #86407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86407/testReport)** for PR 20226 at commit [`bf90ac7`](https://github.com/apache/spark/commit/bf90ac713f1ea909572486b136c44f9e4badc50c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20226 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20331 **[Test build #86403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86403/testReport)** for PR 20331 at commit [`b83f859`](https://github.com/apache/spark/commit/b83f859137ca9ed33c3c7e4295c433b7bbca6eee). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils ` * `class JsonHadoopFsRelationSuite extends HadoopFsRelationTest with SharedSQLContext ` * `abstract class OrcHadoopFsRelationBase extends HadoopFsRelationTest ` * `class ParquetHadoopFsRelationSuite extends HadoopFsRelationTest with SharedSQLContext ` * `class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationBase with TestHiveSingleton ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20226 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/54/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r162769562 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/LocalTableScanExec.scala --- @@ -30,6 +30,8 @@ case class LocalTableScanExec( output: Seq[Attribute], @transient rows: Seq[InternalRow]) extends LeafExecNode { + override val nodeName: String = s"Scan LocalTable ${output.map(_.name).mkString("[", ",", "]")}" --- End diff -- I believe you are referring to the duplication at : https://github.com/apache/spark/blob/3f958a99921d149fb9fdf7ba7e78957afdad1405/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L466 ``` def simpleString: String = s"$nodeName $argString".trim ``` Am changing this line to just have `Scan LocalTable` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19054: [SPARK-18067] Avoid shuffling child if join keys ...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19054#discussion_r162768714 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -271,23 +325,24 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { */ private def reorderJoinPredicates(plan: SparkPlan): SparkPlan = { plan.transformUp { - case BroadcastHashJoinExec(leftKeys, rightKeys, joinType, buildSide, condition, left, --- End diff -- Removal of `BroadcastHashJoinExec` is intentional. The children are expected to have `BroadcastDistribution` or `UnspecifiedDistribution` so this method wont help here (this optimization only helps in case of shuffle based joins) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19054: [SPARK-18067] Avoid shuffling child if join keys ...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19054#discussion_r162768516 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -220,45 +220,76 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { operator.withNewChildren(children) } + private def isSubset(biggerSet: Seq[Expression], smallerSet: Seq[Expression]): Boolean = +smallerSet.length <= biggerSet.length && + smallerSet.forall(x => biggerSet.exists(_.semanticEquals(x))) + private def reorder( leftKeys: Seq[Expression], rightKeys: Seq[Expression], - expectedOrderOfKeys: Seq[Expression], - currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) = { -val leftKeysBuffer = ArrayBuffer[Expression]() -val rightKeysBuffer = ArrayBuffer[Expression]() + expectedOrderOfKeys: Seq[Expression], // comes from child's output partitioning + currentOrderOfKeys: Seq[Expression]): // comes from join predicate + (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = { + +assert(leftKeys.length == rightKeys.length) + +val allLeftKeys = ArrayBuffer[Expression]() +val allRightKeys = ArrayBuffer[Expression]() +val reorderedLeftKeys = ArrayBuffer[Expression]() +val reorderedRightKeys = ArrayBuffer[Expression]() +val processedIndicies = mutable.Set[Int]() expectedOrderOfKeys.foreach(expression => { - val index = currentOrderOfKeys.indexWhere(e => e.semanticEquals(expression)) - leftKeysBuffer.append(leftKeys(index)) - rightKeysBuffer.append(rightKeys(index)) + val index = currentOrderOfKeys.zipWithIndex.find { case (currKey, i) => +!processedIndicies.contains(i) && currKey.semanticEquals(expression) + }.get._2 + processedIndicies.add(index) + + reorderedLeftKeys.append(leftKeys(index)) + allLeftKeys.append(leftKeys(index)) + + reorderedRightKeys.append(rightKeys(index)) + allRightKeys.append(rightKeys(index)) }) -(leftKeysBuffer, rightKeysBuffer) + +// If len(currentOrderOfKeys) > len(expectedOrderOfKeys), then the re-ordering won't have +// all the keys. Append the remaining keys to the end so that we are covering all the keys +for (i <- leftKeys.indices) { + if (!processedIndicies.contains(i)) { +allLeftKeys.append(leftKeys(i)) +allRightKeys.append(rightKeys(i)) + } +} + +assert(allLeftKeys.length == leftKeys.length) +assert(allRightKeys.length == rightKeys.length) +assert(reorderedLeftKeys.length == reorderedRightKeys.length) + +(allLeftKeys, reorderedLeftKeys, allRightKeys, reorderedRightKeys) } private def reorderJoinKeys( leftKeys: Seq[Expression], rightKeys: Seq[Expression], leftPartitioning: Partitioning, - rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) = { + rightPartitioning: Partitioning): + (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = { + if (leftKeys.forall(_.deterministic) && rightKeys.forall(_.deterministic)) { leftPartitioning match { -case HashPartitioning(leftExpressions, _) - if leftExpressions.length == leftKeys.length && -leftKeys.forall(x => leftExpressions.exists(_.semanticEquals(x))) => +case HashPartitioning(leftExpressions, _) if isSubset(leftKeys, leftExpressions) => reorder(leftKeys, rightKeys, leftExpressions, leftKeys) --- End diff -- given that this was only done over `SortMergeJoinExec` and `ShuffledHashJoinExec` where both the partitionings are `HashPartitioning`, things worked fine. I have changed this to have a stricter check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19054 **[Test build #86406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86406/testReport)** for PR 19054 at commit [`00bb14b`](https://github.com/apache/spark/commit/00bb14b0145a2bd42c8b4c8a9d4f108322804f71). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19054 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/53/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19054: [SPARK-18067] Avoid shuffling child if join keys ...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19054#discussion_r162768446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -220,45 +220,76 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { operator.withNewChildren(children) } + private def isSubset(biggerSet: Seq[Expression], smallerSet: Seq[Expression]): Boolean = +smallerSet.length <= biggerSet.length && + smallerSet.forall(x => biggerSet.exists(_.semanticEquals(x))) + private def reorder( leftKeys: Seq[Expression], rightKeys: Seq[Expression], - expectedOrderOfKeys: Seq[Expression], - currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) = { -val leftKeysBuffer = ArrayBuffer[Expression]() -val rightKeysBuffer = ArrayBuffer[Expression]() + expectedOrderOfKeys: Seq[Expression], // comes from child's output partitioning + currentOrderOfKeys: Seq[Expression]): // comes from join predicate + (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = { + +assert(leftKeys.length == rightKeys.length) + +val allLeftKeys = ArrayBuffer[Expression]() +val allRightKeys = ArrayBuffer[Expression]() +val reorderedLeftKeys = ArrayBuffer[Expression]() +val reorderedRightKeys = ArrayBuffer[Expression]() +val processedIndicies = mutable.Set[Int]() expectedOrderOfKeys.foreach(expression => { - val index = currentOrderOfKeys.indexWhere(e => e.semanticEquals(expression)) - leftKeysBuffer.append(leftKeys(index)) - rightKeysBuffer.append(rightKeys(index)) + val index = currentOrderOfKeys.zipWithIndex.find { case (currKey, i) => +!processedIndicies.contains(i) && currKey.semanticEquals(expression) + }.get._2 + processedIndicies.add(index) + + reorderedLeftKeys.append(leftKeys(index)) + allLeftKeys.append(leftKeys(index)) + + reorderedRightKeys.append(rightKeys(index)) + allRightKeys.append(rightKeys(index)) }) -(leftKeysBuffer, rightKeysBuffer) + +// If len(currentOrderOfKeys) > len(expectedOrderOfKeys), then the re-ordering won't have +// all the keys. Append the remaining keys to the end so that we are covering all the keys +for (i <- leftKeys.indices) { + if (!processedIndicies.contains(i)) { +allLeftKeys.append(leftKeys(i)) +allRightKeys.append(rightKeys(i)) + } +} + +assert(allLeftKeys.length == leftKeys.length) +assert(allRightKeys.length == rightKeys.length) +assert(reorderedLeftKeys.length == reorderedRightKeys.length) + +(allLeftKeys, reorderedLeftKeys, allRightKeys, reorderedRightKeys) } private def reorderJoinKeys( leftKeys: Seq[Expression], rightKeys: Seq[Expression], leftPartitioning: Partitioning, - rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) = { + rightPartitioning: Partitioning): + (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = { --- End diff -- added more doc. I wasn't sure how to make it easier to understand. Hope that the example helps with that --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #86405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86405/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/52/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20297 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4068 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4068/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20297 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162764494 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -1002,4 +1000,12 @@ private object ApiHelper { } } + def lastStageNameAndDescription(store: AppStatusStore, job: JobData): (String, String) = { +store.asOption(store.lastStageAttempt(job.stageIds.max)) match { + case Some(lastStageAttempt) => +(lastStageAttempt.name, lastStageAttempt.description.getOrElse(job.name)) + case None => ("", "") --- End diff -- This would probably be simpler: ``` val stage = store.asOption(...) (stage.map(_.name).getOrElse(""), stage.map(_.description.getOrElse(job.name))) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20330 You could also add 'Closes #20287' to the PR description to close the other PR for the same bug automatically. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162762976 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -1002,4 +1000,12 @@ private object ApiHelper { } } + def lastStageNameAndDescription(store: AppStatusStore, job: JobData): (String, String) = { +store.asOption(store.lastStageAttempt(job.stageIds.max)) match { + case Some(lastStageAttempt) => +(lastStageAttempt.name, lastStageAttempt.description.getOrElse(job.name)) + case None => ("", "") --- End diff -- Before, you were doing `if (lastStageDescription.isEmpty) job.name else blah` at the call site. Now, when the last stage is not in the store, the call site is getting an empty string as the description, instead of using the job name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20324 LGTM. Thanks! ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20203 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86400/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20203 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20203 **[Test build #86400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86400/testReport)** for PR 20203 at commit [`cf6e0c9`](https://github.com/apache/spark/commit/cf6e0c919e151c26772ec78a10abc6d2454f7dd5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20330 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86399/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20330 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20330 **[Test build #86399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86399/testReport)** for PR 20330 at commit [`d5fdabb`](https://github.com/apache/spark/commit/d5fdabb678f4df7c101d8660cb7c37086e35489a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20324 cc @WeichenXu123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20330 **[Test build #86404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86404/testReport)** for PR 20330 at commit [`f19d3a1`](https://github.com/apache/spark/commit/f19d3a1dce67cb8af682c1de9bd41411be1d8b0d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20325#discussion_r162760318 --- Diff: docs/sql-programming-guide.md --- @@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. Unlike the `createOrReplaceT Hive metastore. Persistent tables will still exist even after your Spark program has restarted, as long as you maintain your connection to the same metastore. A DataFrame for a persistent table can be created by calling the `table` method on a `SparkSession` with the name of the table. +Notice that for `DataFrames` is built on Hive table, `insertInto` should be used instead of `saveAsTable`. --- End diff -- Let us get rid of `Notice that for DataFrames is built on Hive table,`. `insertInto` can work for any existing table. More importantly, `DataFrames` might be created from scratch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20046 We shall also cover the sql interface, you can find some example in `sql/core/src/test/resources/sql-tests/inputs/udaf.sql` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162759866 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -427,23 +435,21 @@ private[ui] class JobDataSource( val formattedDuration = duration.map(d => UIUtils.formatDuration(d)).getOrElse("Unknown") val submissionTime = jobData.submissionTime val formattedSubmissionTime = submissionTime.map(UIUtils.formatDate).getOrElse("Unknown") -val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max) -val lastStageDescription = lastStageAttempt.description.getOrElse("") +val (lastStageName, lastStageDescription) = lastStageNameAndDescription(store, jobData) -val formattedJobDescription = - UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) +val jobDescription = UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) --- End diff -- I've moved this logic to `lastStageNameAndDescription`, so it's uniform. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18983: [SPARK-21771][SQL]remove useless hive client in S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18983 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18983: [SPARK-21771][SQL]remove useless hive client in SparkSQL...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18983 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20046: [SPARK-22362][SQL] Add unit test for Window Aggre...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20046#discussion_r162759216 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -86,6 +93,429 @@ class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { assert(e.message.contains("requires window to be ordered")) } + test("aggregation and rows between") { +val df = Seq((1, "1"), (2, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value") --- End diff -- We shall also include null data. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20333#discussion_r162759088 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1108,15 +1108,19 @@ object CheckCartesianProducts extends Rule[LogicalPlan] with PredicateHelper { */ def isCartesianProduct(join: Join): Boolean = { val conditions = join.condition.map(splitConjunctivePredicates).getOrElse(Nil) -!conditions.map(_.references).exists(refs => refs.exists(join.left.outputSet.contains) -&& refs.exists(join.right.outputSet.contains)) + +conditions match { + case Seq(Literal.FalseLiteral) | Seq(Literal(null, BooleanType)) => false + case _ => !conditions.map(_.references).exists(refs => +refs.exists(join.left.outputSet.contains) && refs.exists(join.right.outputSet.contains)) +} } def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.crossJoinEnabled) { plan } else plan transform { - case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, condition) + case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, _) --- End diff -- For inner joins, we will not hit this, because it is already optimized to an empty relation. For the other outer join types, we face the exactly same issue as the condition is true. That is, the size of the join result sets is still the same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20333 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20333 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86401/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20333 **[Test build #86401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86401/testReport)** for PR 20333 at commit [`9c88781`](https://github.com/apache/spark/commit/9c88781dcd4cd301373927bfbe7f3530c80f4f05). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20333#discussion_r162758553 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala --- @@ -274,4 +274,18 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { checkAnswer(innerJoin, Row(1) :: Nil) } + test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct when join condition " + +"is false or null") { +val df = spark.range(10) +val dfNull = spark.range(10).select(lit(null).as("b")) +val planNull = df.join(dfNull, $"id" === $"b", "left").queryExecution.analyzed + +spark.sessionState.executePlan(planNull).optimizedPlan + +val dfOne = df.select(lit(1).as("a")) +val dfTwo = spark.range(10).select(lit(2).as("a")) --- End diff -- `a` -> `b` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20331 **[Test build #86403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86403/testReport)** for PR 20331 at commit [`b83f859`](https://github.com/apache/spark/commit/b83f859137ca9ed33c3c7e4295c433b7bbca6eee). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/51/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20331 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86402/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20331 **[Test build #86402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86402/testReport)** for PR 20331 at commit [`b83f859`](https://github.com/apache/spark/commit/b83f859137ca9ed33c3c7e4295c433b7bbca6eee). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils ` * `class JsonHadoopFsRelationSuite extends HadoopFsRelationTest with SharedSQLContext ` * `abstract class OrcHadoopFsRelationBase extends HadoopFsRelationTest ` * `class ParquetHadoopFsRelationSuite extends HadoopFsRelationTest with SharedSQLContext ` * `class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationBase with TestHiveSingleton ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20316: [SPARK-23149][SQL] polish ColumnarBatch
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20316#discussion_r162753323 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java --- @@ -96,16 +90,6 @@ public void setNumRows(int numRows) { */ public int numRows() { return numRows; } - /** - * Returns the schema that makes up this batch. - */ - public StructType schema() { return schema; } - - /** - * Returns the max capacity (in number of rows) for this batch. - */ - public int capacity() { return capacity; } --- End diff -- I agree to remove these fields `schema` and `capacity` from `ColumnarBatch`. Is it better to prepare APIs to get `schema` and `capacity` from a set of `ColumnVector`s? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20203 @attilapiros test failures look real (you probably just need to regenerate some of those expectations). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20203 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86398/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20203 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20203 **[Test build #86398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86398/testReport)** for PR 20203 at commit [`41dd7bb`](https://github.com/apache/spark/commit/41dd7bbc1f62e093738e730bf3f5bfeb3dff16fb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerNodeBlacklistedForStage(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/50/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20177: [SPARK-22954][SQL] Fix the exception thrown by Analyze c...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20177 We shall also update for `AnalyzePartitionCommand` and `AnalyzeColumnCommand`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20331 **[Test build #86402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86402/testReport)** for PR 20331 at commit [`b83f859`](https://github.com/apache/spark/commit/b83f859137ca9ed33c3c7e4295c433b7bbca6eee). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20177: [SPARK-22954][SQL] Fix the exception thrown by An...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20177#discussion_r162749089 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala --- @@ -154,11 +155,17 @@ abstract class SQLViewSuite extends QueryTest with SQLTestUtils { assertNoSuchTable(s"TRUNCATE TABLE $viewName") assertNoSuchTable(s"SHOW CREATE TABLE $viewName") assertNoSuchTable(s"SHOW PARTITIONS $viewName") - assertNoSuchTable(s"ANALYZE TABLE $viewName COMPUTE STATISTICS") - assertNoSuchTable(s"ANALYZE TABLE $viewName COMPUTE STATISTICS FOR COLUMNS id") + assertAnalysisException(s"ANALYZE TABLE $viewName COMPUTE STATISTICS") --- End diff -- We should also check the error message to ensure the `AnalysisException` is not thrown from elsewhere. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20277#discussion_r162748352 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java --- @@ -55,164 +43,82 @@ public void close() { if (childColumns != null) { for (int i = 0; i < childColumns.length; i++) { childColumns[i].close(); +childColumns[i] = null; --- End diff -- Is it OK not to call `close()` while `ColumnVector.close()` is provided? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20277#discussion_r162747998 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java --- @@ -33,18 +33,6 @@ private final ArrowVectorAccessor accessor; private ArrowColumnVector[] childColumns; - private void ensureAccessible(int index) { -ensureAccessible(index, 1); - } - - private void ensureAccessible(int index, int count) { --- End diff -- I agree with this in non-debug version. Can we add assert of this check at each caller site for debugging? p.s. Sorry for slow reviews since I am on vacation this week. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4067/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r162747183 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -201,9 +184,13 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String @Since("1.4.0") override def transformSchema(schema: StructType): StructType = { -if (isBucketizeMultipleColumns()) { +ParamValidators.checkExclusiveParams(this, "inputCol", "inputCols") --- End diff -- I see. I'll see if I can come up with something which is generic but handles these other checks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4066/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20177: [SPARK-22954][SQL] Fix the exception thrown by An...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20177#discussion_r162746808 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala --- @@ -31,9 +31,9 @@ case class AnalyzeTableCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val sessionState = sparkSession.sessionState -val db = tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase) -val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db)) -val tableMeta = sessionState.catalog.getTableMetadata(tableIdentWithDB) +val db = tableIdent.database +val tableIdentWithDB = TableIdentifier(tableIdent.table, db) +val tableMeta = sessionState.catalog.getTempViewOrPermanentTableMetadata(tableIdentWithDB) --- End diff -- Wouldn't this fail if we have a table that neglect the current database in tableIdent? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86395/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20297 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #86395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86395/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20277 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20277 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86394/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20277 **[Test build #86394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86394/testReport)** for PR 20277 at commit [`3972093`](https://github.com/apache/spark/commit/397209342646a253a56650df8a00dfb6d66c834e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20333 **[Test build #86401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86401/testReport)** for PR 20333 at commit [`9c88781`](https://github.com/apache/spark/commit/9c88781dcd4cd301373927bfbe7f3530c80f4f05). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20333 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20333 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/49/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/20333 [SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition is false/null ## What changes were proposed in this pull request? CheckCartesianProduct raises an AnalysisException also when the join condition is always false/null. In this case, we shouldn't raise it, since the result will not be a cartesian product. ## How was this patch tested? added UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-23087 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20333.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20333 commit 9c88781dcd4cd301373927bfbe7f3530c80f4f05 Author: Marco GaidoDate: 2018-01-19T20:45:29Z [SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition is false/null --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20299: [SPARK-23135][ui] Fix rendering of accumulators i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20299 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org