[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23231 **[Test build #99712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99712/testReport)** for PR 23231 at commit [`453d60f`](https://github.com/apache/spark/commit/453d60f42b99de621a7ee3fab6bc6138fc20ed05). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5776/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99736/testReport)** for PR 23207 at commit [`76d1ca0`](https://github.com/apache/spark/commit/76d1ca0036bbb50a005e9d12f8b22bf21697af7f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23238 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5775/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23238 **[Test build #99732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99732/testReport)** for PR 23238 at commit [`5bbcf41`](https://github.com/apache/spark/commit/5bbcf41f34f2ca160da7ef4ebe4c54d15a2d09b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23238 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99732/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/23223 the approach here makes sense. Are you seeing actual issues with this blacklisting when it shouldn't? I could see that possible there and if so we should move this to defect and make sure it goes into 2.4.1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22575: [SPARK-24630][SS] Support SQLStreaming in Spark
Github user stczwd commented on a diff in the pull request: https://github.com/apache/spark/pull/22575#discussion_r239113033 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -631,6 +631,33 @@ object SQLConf { .intConf .createWithDefault(200) + val SQLSTREAM_WATERMARK_ENABLE = buildConf("spark.sqlstreaming.watermark.enable") +.doc("Whether use watermark in sqlstreaming.") +.booleanConf +.createWithDefault(false) + + val SQLSTREAM_OUTPUTMODE = buildConf("spark.sqlstreaming.outputMode") +.doc("The output mode used in sqlstreaming") +.stringConf +.createWithDefault("append") + + val SQLSTREAM_TRIGGER = buildConf("spark.sqlstreaming.trigger") --- End diff -- I don't think there are any problems with this. SQLStreaming is using Command to run streaming query, which is similar to InsertIntoHiveTable. herefore, the batch SQL and streaming SQL solution is expected.In addition, currently an application can only run one streaming SQL. Therefore, the batch SQL and streaming SQL solution is expected --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23196 **[Test build #99734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99734/testReport)** for PR 23196 at commit [`07fcf46`](https://github.com/apache/spark/commit/07fcf4666a96928c8096db7a131e6514013679f0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23159 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23196 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23159 **[Test build #99735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99735/testReport)** for PR 23159 at commit [`e0aa626`](https://github.com/apache/spark/commit/e0aa626c886976489348a6c0179d160bbe3252da). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5774/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239110361 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -612,11 +612,14 @@ private[yarn] class YarnAllocator( val message = "Container killed by YARN for exceeding physical memory limits. " + s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}." (true, message) + case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) => --- End diff -- yeah I agree this should be cleaned up we already handle cases above that are in the NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS set. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99714/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23196 **[Test build #99714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99714/testReport)** for PR 23196 at commit [`07fcf46`](https://github.com/apache/spark/commit/07fcf4666a96928c8096db7a131e6514013679f0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23238 **[Test build #99732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99732/testReport)** for PR 23238 at commit [`5bbcf41`](https://github.com/apache/spark/commit/5bbcf41f34f2ca160da7ef4ebe4c54d15a2d09b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23238 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22957 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23223 **[Test build #99733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99733/testReport)** for PR 23223 at commit [`65a70dc`](https://github.com/apache/spark/commit/65a70dcbb7993731104deab2592a5b969a31414e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5773/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23159 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23238 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23233: [SPARK-26233][SQL][BACKPORT-2.3] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23233 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22575: [SPARK-24630][SS] Support SQLStreaming in Spark
Github user stczwd commented on a diff in the pull request: https://github.com/apache/spark/pull/22575#discussion_r239109280 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/SQLStreamingSink.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import java.util.concurrent.TimeUnit + +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.catalog.CatalogTable +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.streaming.InternalOutputModes +import org.apache.spark.sql.execution.command.RunnableCommand +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils +import org.apache.spark.sql.sources.v2.StreamingWriteSupportProvider +import org.apache.spark.sql.streaming.Trigger +import org.apache.spark.util.Utils + +/** + * The basic RunnableCommand for SQLStreaming, using Command.run to start a streaming query. + * + * @param sparkSession + * @param extraOptions + * @param partitionColumnNames + * @param child + */ +case class SQLStreamingSink(sparkSession: SparkSession, +table: CatalogTable, +child: LogicalPlan) + extends RunnableCommand { + + private val sqlConf = sparkSession.sqlContext.conf + + /** + * The given column name may not be equal to any of the existing column names if we were in + * case-insensitive context. Normalize the given column name to the real one so that we don't + * need to care about case sensitivity afterwards. + */ + private def normalize(df: DataFrame, columnName: String, columnType: String): String = { +val validColumnNames = df.logicalPlan.output.map(_.name) +validColumnNames.find(sparkSession.sessionState.analyzer.resolver(_, columnName)) + .getOrElse(throw new AnalysisException(s"$columnType column $columnName not found in " + +s"existing columns (${validColumnNames.mkString(", ")})")) + } + + /** + * Parse spark.sqlstreaming.trigger.seconds to Trigger + */ + private def parseTrigger(): Trigger = { +val trigger = Utils.timeStringAsMs(sqlConf.sqlStreamTrigger) +Trigger.ProcessingTime(trigger, TimeUnit.MICROSECONDS) --- End diff -- Yeah, I will change it to milliseconds. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22957 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99713/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23233: [SPARK-26233][SQL][BACKPORT-2.3] CheckOverflow when enco...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23233 **[Test build #99717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99717/testReport)** for PR 23233 at commit [`a1e7744`](https://github.com/apache/spark/commit/a1e77445c2675137fbcddf73181c47469f159dbf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/23223 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22957 **[Test build #99713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99713/testReport)** for PR 22957 at commit [`e4f617f`](https://github.com/apache/spark/commit/e4f617fc7e47d7c49f3d773ac2d91c5508c0a239). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23233: [SPARK-26233][SQL][BACKPORT-2.3] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23233 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99717/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23159 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5772/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23159 **[Test build #99715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99715/testReport)** for PR 23159 at commit [`e0aa626`](https://github.com/apache/spark/commit/e0aa626c886976489348a6c0179d160bbe3252da). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23227 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23159 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99715/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc fo...
GitHub user seancxmao opened a pull request: https://github.com/apache/spark/pull/23238 [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-insensitive field resolution when reading from Parquet ## What changes were proposed in this pull request? #22148 introduces a behavior change. According to discussion at #22184, this PR updates migration guide when upgrade from Spark 2.3 to 2.4. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/seancxmao/spark SPARK-25132-doc-2.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23238.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23238 commit 5bbcf41f34f2ca160da7ef4ebe4c54d15a2d09b5 Author: seancxmao Date: 2018-12-05T15:05:38Z [SPARK-25132][SQL][FOLLOWUP] Update migration doc for case-insensitive field resolution when reading from Parquet --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23227 **[Test build #99731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99731/testReport)** for PR 23227 at commit [`5cb416d`](https://github.com/apache/spark/commit/5cb416df5f03b0d750c83e1a8a344b8ea44b1735). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23227 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as a...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/23231 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23231 Then let me close this now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/23227 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99719/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23227 **[Test build #99719 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99719/testReport)** for PR 23227 at commit [`5cb416d`](https://github.com/apache/spark/commit/5cb416df5f03b0d750c83e1a8a344b8ea44b1735). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23231 Ok. Maybe we can add few words in ml migration guide to clearly announce this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23213 `wholeStage=false, factoryMode=CODE_ONLY` and `wholeStage=false, factoryMode=NO_CODEGEN` should have more complete test coverage for `GenerateUnsafeProject`, `GenerateMutableProject`, etc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23163: [SPARK-26164][SQL] Allow FileFormatWriter to write multi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23163 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23163: [SPARK-26164][SQL] Allow FileFormatWriter to write multi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23163 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99708/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23163: [SPARK-26164][SQL] Allow FileFormatWriter to write multi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23163 **[Test build #99708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99708/testReport)** for PR 23163 at commit [`6cb993b`](https://github.com/apache/spark/commit/6cb993b26e6b6867b3315228b55624b98acf1dcb). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23231 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99707/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23231 I'm not seeing it in the migration guide, maybe I'm missing it. In any event, I dont' think we need to keep this for 3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23231 **[Test build #99707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99707/testReport)** for PR 23231 at commit [`1716071`](https://github.com/apache/spark/commit/17160710cadc49b54f4385ae3ca9ddb0eb4034b0). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OneHotEncoderEstimator @Since(\"2.3.0\") (@Since(\"2.3.0\") override val uid: String)` * `class OneHotEncoderEstimator(JavaEstimator, HasInputCols, HasOutputCols, HasHandleInvalid,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239090244 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. Different with + * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => reporter) set in + * shuffle dependency, so the local SQLMetric should transient and create on executor. + * @param metrics Shuffle write metrics in current SparkPlan. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + */ +private[spark] case class SQLShuffleWriteMetricsReporter( +metrics: Map[String, SQLMetric])(metricsReporter: ShuffleWriteMetricsReporter) + extends ShuffleWriteMetricsReporter with Serializable { + @transient private[this] lazy val _bytesWritten = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_BYTES_WRITTEN) + @transient private[this] lazy val _recordsWritten = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_RECORDS_WRITTEN) + @transient private[this] lazy val _writeTime = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_WRITE_TIME) + + override private[spark] def incBytesWritten(v: Long): Unit = { +metricsReporter.incBytesWritten(v) +_bytesWritten.add(v) + } + override private[spark] def decRecordsWritten(v: Long): Unit = { +metricsReporter.decBytesWritten(v) +_recordsWritten.set(_recordsWritten.value - v) + } + override private[spark] def incRecordsWritten(v: Long): Unit = { +metricsReporter.incRecordsWritten(v) +_recordsWritten.add(v) + } + override private[spark] def incWriteTime(v: Long): Unit = { +metricsReporter.incWriteTime(v) +_writeTime.add(v) + } + override private[spark] def decBytesWritten(v: Long): Unit = { +metricsReporter.decBytesWritten(v) +_bytesWritten.set(_bytesWritten.value - v) + } +} + +private[spark] object SQLShuffleWriteMetricsReporter { + val SHUFFLE_BYTES_WRITTEN = "shuffleBytesWritten" + val SHUFFLE_RECORDS_WRITTEN = "shuffleRecordsWritten" + val SHUFFLE_WRITE_TIME = "shuffleWriteTime" --- End diff -- cc @rxin , do you think we should change this metric to use ms as well? In all the places that read/write it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 If we look at test coverage, `wholeStage=false, factoryMode=CODE_ONLY` will go through code paths that wholeStageCodegen doesn't cover. Or did I miss something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 But whole stage codegen will not test `GenerateUnsafeProject`, `GenerateMutableProject`, etc., right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22184: [SPARK-25132][SQL][DOC] Add migration doc for case-insen...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22184 @srowen Sorry for the late reply! I'd like to close this PR and file a new one since our SQL doc has changed a lot. Thank you all for your comments and time! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...
Github user seancxmao closed the pull request at: https://github.com/apache/spark/pull/22184 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23222 Jenkins passes, which means the previously added end-to-end test can't not show the benefit of this rule. We should update it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23224: [SPARK-26277][SQL][TEST] WholeStageCodegen metrics shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99711/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23224: [SPARK-26277][SQL][TEST] WholeStageCodegen metrics shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23237 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99730/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23237 **[Test build #99730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99730/testReport)** for PR 23237 at commit [`90b111f`](https://github.com/apache/spark/commit/90b111f900d8f11e4d730e0cfbe56a1683f96faa). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23237 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23224: [SPARK-26277][SQL][TEST] WholeStageCodegen metrics shoul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23224 **[Test build #99711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99711/testReport)** for PR 23224 at commit [`021728c`](https://github.com/apache/spark/commit/021728ccc70cf971592c560cfc5492dedbdc362a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/23222 Shall we add a SQL tag to the title? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23234: [SPARK-26233][SQL][BACKPORT-2.2] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99718/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23234: [SPARK-26233][SQL][BACKPORT-2.2] CheckOverflow when enco...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23234 **[Test build #99718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99718/testReport)** for PR 23234 at commit [`930c510`](https://github.com/apache/spark/commit/930c51029b845c74357305e7ec30a4f2e6ea748a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23231 It is because we have such claim in ml migration guide that said we will keep OneHotEncoderEstimator as an alias. I'm fine if we have consensus now that we can avoid such alias. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23237 **[Test build #99730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99730/testReport)** for PR 23237 at commit [`90b111f`](https://github.com/apache/spark/commit/90b111f900d8f11e4d730e0cfbe56a1683f96faa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23237 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23237: [SPARK-26279][CORE] Remove unused method in Loggi...
GitHub user seancxmao opened a pull request: https://github.com/apache/spark/pull/23237 [SPARK-26279][CORE] Remove unused method in Logging ## What changes were proposed in this pull request? The method `Logging.isTraceEnabled` is not used anywhere. We should remove it to avoid confusion. ## How was this patch tested? Test locally with existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/seancxmao/spark clean-logging Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23237.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23237 commit 90b111f900d8f11e4d730e0cfbe56a1683f96faa Author: seancxmao Date: 2018-12-05T14:07:49Z [SPARK-26279][CORE] Remove unused methods in Logging --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23237 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239052799 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -612,11 +612,14 @@ private[yarn] class YarnAllocator( val message = "Container killed by YARN for exceeding physical memory limits. " + s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}." (true, message) + case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) => +(true, "Container marked as failed: " + containerId + onHostStr + --- End diff -- Nit: Use string interpolation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239059997 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -612,11 +612,14 @@ private[yarn] class YarnAllocator( val message = "Container killed by YARN for exceeding physical memory limits. " + s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}." (true, message) + case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) => --- End diff -- I would prefer not to have it as a separate case but just a new if around `handleResourceAllocationFailure` and as NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS is introduced it would make sense to separate it from the huge match on exitStatus. This way it would be easier to follow when it is really triggered (one should not check all the previous case branches then consider this condition with contains). That way values like ContainerExitStatus.SUCCESS from the set would be really used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239070925 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala --- @@ -114,13 +116,20 @@ class YarnAllocatorSuite extends SparkFunSuite with Matchers with BeforeAndAfter clock) } - def createContainer(host: String, resource: Resource = containerResource): Container = { -val containerId = ContainerId.newContainerId(appAttemptId, containerNum) + def createContainer( + host: String, + containerId: ContainerId = ContainerId.newContainerId(appAttemptId, containerNum), --- End diff -- Just containerNumber as parameter with default value of containerNum? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23228 **[Test build #4453 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4453/testReport)** for PR 23228 at commit [`d5dadbf`](https://github.com/apache/spark/commit/d5dadbf30d5429c36ec3d5c2845a71c2717fd6f3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23229: [MINOR][CORE] Modify some field name because it may be c...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23229 Agree, this isn't worthwhile. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 yea, I think they're not totally the same..., but I'm not sure that the test run (`wholeStage=false, factoryMode=CODE_ONLY`) is worth the time cost. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239069014 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. Different with + * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => reporter) set in + * shuffle dependency, so the local SQLMetric should transient and create on executor. + * @param metrics Shuffle write metrics in current SparkPlan. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + */ +private[spark] case class SQLShuffleWriteMetricsReporter( +metrics: Map[String, SQLMetric])(metricsReporter: ShuffleWriteMetricsReporter) + extends ShuffleWriteMetricsReporter with Serializable { + @transient private[this] lazy val _bytesWritten = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_BYTES_WRITTEN) + @transient private[this] lazy val _recordsWritten = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_RECORDS_WRITTEN) + @transient private[this] lazy val _writeTime = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_WRITE_TIME) + + override private[spark] def incBytesWritten(v: Long): Unit = { +metricsReporter.incBytesWritten(v) +_bytesWritten.add(v) + } + override private[spark] def decRecordsWritten(v: Long): Unit = { +metricsReporter.decBytesWritten(v) +_recordsWritten.set(_recordsWritten.value - v) + } + override private[spark] def incRecordsWritten(v: Long): Unit = { +metricsReporter.incRecordsWritten(v) +_recordsWritten.add(v) + } + override private[spark] def incWriteTime(v: Long): Unit = { +metricsReporter.incWriteTime(v) +_writeTime.add(v) + } + override private[spark] def decBytesWritten(v: Long): Unit = { +metricsReporter.decBytesWritten(v) +_bytesWritten.set(_bytesWritten.value - v) + } +} + +private[spark] object SQLShuffleWriteMetricsReporter { + val SHUFFLE_BYTES_WRITTEN = "shuffleBytesWritten" + val SHUFFLE_RECORDS_WRITTEN = "shuffleRecordsWritten" + val SHUFFLE_WRITE_TIME = "shuffleWriteTime" --- End diff -- Just this shuffle write time in this PR. The left one of time metrics is `fetch wait time`, it's in ms set in `ShuffleBlockFetcherIterator`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23196: [SPARK-26243][SQL] Use java.time API for parsing ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23196#discussion_r239068840 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -49,8 +49,8 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { override def beforeAll() { super.beforeAll() TestHive.setCacheTables(true) -// Timezone is fixed to America/Los_Angeles for those timezone sensitive tests (timestamp_*) -TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) +// Timezone is fixed to GMT for those timezone sensitive tests (timestamp_*) --- End diff -- I think consistency is indeed a problem, but why disable the new parser, rather than make this consistent? I haven't looked into whether there's a good reason they behave differently but suspect not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239067552 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -163,6 +171,8 @@ object SQLMetrics { Utils.bytesToString } else if (metricsType == TIMING_METRIC) { Utils.msDurationToString + } else if (metricsType == NS_TIMING_METRIC) { +duration => Utils.msDurationToString(duration / 1000 / 1000) --- End diff -- Maybe it's ok, as I test this locally with UT in SQLMetricsSuites, result below: ``` shuffle records written: 2 shuffle write time total (min, med, max): 37 ms (37 ms, 37 ms, 37 ms) shuffle bytes written total (min, med, max): 66.0 B (66.0 B, 66.0 B, 66.0 ``` In the actual scenario the shuffle bytes written will be more larger, and keep the time to ms maybe enough, WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...
Github user wangjiaochun commented on the issue: https://github.com/apache/spark/pull/23225 1. I think test case writeEmptyIterator in UnsafeShuffleWriterSuite.java cover this scenes 2. I will propose a JIRA soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23218 Ah OK, so all of them were a JVM crash. It would probably be a good idea to update the JVM on all the workers as _60 is over 3 years old. It's probably not as simple as it sounds but WDYT @shaneknapp ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23226: [MINOR][TEST] Add MAXIMUM_PAGE_SIZE_BYTES Excepti...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23226#discussion_r239066440 --- Diff: core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java --- @@ -622,6 +622,17 @@ public void initialCapacityBoundsChecking() { } catch (IllegalArgumentException e) { // expected exception } + +try { + new BytesToBytesMap( + taskMemoryManager, --- End diff -- ok,I will correct this indentation and propose JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99729/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99729/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR It's a question what is not big deal, I've seen ~1 hour glob request when huge amount of files stored :) If file move is even worse one more reason to move it to separate thread. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99727/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99727/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99726/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org