[GitHub] [spark] AmplabJenkins removed a comment on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens
AmplabJenkins removed a comment on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens URL: https://github.com/apache/spark/pull/24749#issuecomment-497585758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11245/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations
SparkQA removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations URL: https://github.com/apache/spark/pull/24700#issuecomment-497554557 **[Test build #105987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105987/testReport)** for PR 24700 at commit [`6ad1cd8`](https://github.com/apache/spark/commit/6ad1cd8bf99693675541de2006e9cb006b1b1c95). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens
AmplabJenkins removed a comment on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens URL: https://github.com/apache/spark/pull/24749#issuecomment-497585751 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations
SparkQA commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations URL: https://github.com/apache/spark/pull/24700#issuecomment-497586002 **[Test build #105987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105987/testReport)** for PR 24700 at commit [`6ad1cd8`](https://github.com/apache/spark/commit/6ad1cd8bf99693675541de2006e9cb006b1b1c95). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens
SparkQA commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens URL: https://github.com/apache/spark/pull/24749#issuecomment-497586177 **[Test build #105995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105995/testReport)** for PR 24749 at commit [`5b7a025`](https://github.com/apache/spark/commit/5b7a025e101246b67d312cb7dcd918e379964a9c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens
AmplabJenkins commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens URL: https://github.com/apache/spark/pull/24749#issuecomment-497585751 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens
AmplabJenkins commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens URL: https://github.com/apache/spark/pull/24749#issuecomment-497585758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11245/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens
SparkQA commented on issue #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens URL: https://github.com/apache/spark/pull/24749#issuecomment-497584679 **[Test build #105994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105994/testReport)** for PR 24749 at commit [`8fd8fa9`](https://github.com/apache/spark/commit/8fd8fa933c6644d017212ed83872ab6aa4a71f35). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#discussion_r289264248 ## File path: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala ## @@ -278,6 +278,31 @@ class MesosClusterSchedulerSuite extends SparkFunSuite with LocalSparkContext wi assert(uris.asScala.forall(_.getCache)) } + test("SPARK-26082 supports setting fetcher cache in the submission") { Review comment: For this case, yep. If you want, you can remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liancheng commented on a change in pull request #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens
liancheng commented on a change in pull request #24749: [SPARK-27890][SQL] Improve SQL parser error message for identifier with hyphens URL: https://github.com/apache/spark/pull/24749#discussion_r289264175 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -786,6 +790,16 @@ qualifiedName : identifier ('.' identifier)* ; +errorCapturingIdentifier +: identifier errorCapturingIdentifierExtra +; + +// extrq grammer for left refactoring Review comment: Typo: extrq => extra This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24743: [WIP][SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]
dongjoon-hyun commented on issue #24743: [WIP][SPARK-27883][SQL] Port AGGREGATES.sql [Part 2] URL: https://github.com/apache/spark/pull/24743#issuecomment-497584161 Got it, @wangyum ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mwlon commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
mwlon commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#discussion_r289263830 ## File path: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala ## @@ -278,6 +278,31 @@ class MesosClusterSchedulerSuite extends SparkFunSuite with LocalSparkContext wi assert(uris.asScala.forall(_.getCache)) } + test("SPARK-26082 supports setting fetcher cache in the submission") { Review comment: Ah, right - actually, can I just get rid of the JIRA tag? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
dongjoon-hyun commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497583959 Wow, looks useful! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax
AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax URL: https://github.com/apache/spark/pull/24472#issuecomment-497582962 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11244/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax
AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax URL: https://github.com/apache/spark/pull/24472#issuecomment-497582958 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax
SparkQA commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax URL: https://github.com/apache/spark/pull/24472#issuecomment-497583318 **[Test build #105993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105993/testReport)** for PR 24472 at commit [`0ee3bc9`](https://github.com/apache/spark/commit/0ee3bc9e870ca583c85029b1c7e29c4f089365f8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax
AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax URL: https://github.com/apache/spark/pull/24472#issuecomment-497582962 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11244/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax
AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax URL: https://github.com/apache/spark/pull/24472#issuecomment-497582958 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
dongjoon-hyun commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency URL: https://github.com/apache/spark/pull/24751#issuecomment-497583059 Thank you for making the PR again, @wangyum . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r289261872 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.parquet + +import java.net.URI +import java.util.TimeZone + +import org.apache.hadoop.fs.Path +import org.apache.hadoop.mapreduce._ +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.apache.parquet.filter2.compat.FilterCompat +import org.apache.parquet.filter2.predicate.{FilterApi, FilterPredicate} +import org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS +import org.apache.parquet.hadoop.{ParquetFileReader, ParquetInputFormat, ParquetInputSplit, ParquetRecordReader} + +import org.apache.spark.TaskContext +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.UnsafeRow +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.datasources.{PartitionedFile, RecordReaderIterator} +import org.apache.spark.sql.execution.datasources.parquet._ +import org.apache.spark.sql.execution.datasources.v2._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.{InputPartition, PartitionReader} +import org.apache.spark.sql.types.{AtomicType, StructType} +import org.apache.spark.sql.vectorized.ColumnarBatch +import org.apache.spark.util.SerializableConfiguration + +/** + * A factory used to create Parquet readers. + * + * @param sqlConf SQL configuration. + * @param broadcastedConf Broadcast serializable Hadoop Configuration. + * @param dataSchema Schema of Parquet files. + * @param readDataSchema Required schema of Parquet files. + * @param partitionSchema Schema of partitions. + * @param filters Filters of the batch scan. + */ +case class ParquetPartitionReaderFactory( +sqlConf: SQLConf, +broadcastedConf: Broadcast[SerializableConfiguration], +dataSchema: StructType, +readDataSchema: StructType, +partitionSchema: StructType, +filters: Array[Filter]) extends FilePartitionReaderFactory with Logging { + private val isCaseSensitive = sqlConf.caseSensitiveAnalysis + private val resultSchema = StructType(partitionSchema.fields ++ readDataSchema.fields) + private val enableOffHeapColumnVector = sqlConf.offHeapColumnVectorEnabled + private val enableVectorizedReader: Boolean = sqlConf.parquetVectorizedReaderEnabled && +resultSchema.forall(_.dataType.isInstanceOf[AtomicType]) + private val enableRecordFilter: Boolean = sqlConf.parquetRecordFilterEnabled + private val timestampConversion: Boolean = sqlConf.isParquetINT96TimestampConversion + private val capacity = sqlConf.parquetVectorizedReaderBatchSize + private val enableParquetFilterPushDown: Boolean = sqlConf.parquetFilterPushDown + private val pushDownDate = sqlConf.parquetFilterPushDownDate + private val pushDownTimestamp = sqlConf.parquetFilterPushDownTimestamp + private val pushDownDecimal = sqlConf.parquetFilterPushDownDecimal + private val pushDownStringStartWith = sqlConf.parquetFilterPushDownStringStartWith + private val pushDownInFilterThreshold = sqlConf.parquetFilterPushDownInFilterThreshold + + override def supportColumnarReads(partition: InputPartition): Boolean = { +sqlConf.parquetVectorizedReaderEnabled && sqlConf.wholeStageEnabled && + resultSchema.length <= sqlConf.wholeStageMaxNumFields && + resultSchema.forall(_.dataType.isInstanceOf[AtomicType]) + } + + override def buildReader(file: PartitionedFile): PartitionReader[InternalRow] = { +val reader = if (enableVectorizedReader) { + createVectorizedReader(file) +} else { + createRowBaseReader(file) +} + +val fileReader = new PartitionReader[InternalRow] { + override def next(): Boolean =
[GitHub] [spark] wangyum commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax
wangyum commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax URL: https://github.com/apache/spark/pull/24472#issuecomment-497582394 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497581308 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105989/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r289261872 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.parquet + +import java.net.URI +import java.util.TimeZone + +import org.apache.hadoop.fs.Path +import org.apache.hadoop.mapreduce._ +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.apache.parquet.filter2.compat.FilterCompat +import org.apache.parquet.filter2.predicate.{FilterApi, FilterPredicate} +import org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS +import org.apache.parquet.hadoop.{ParquetFileReader, ParquetInputFormat, ParquetInputSplit, ParquetRecordReader} + +import org.apache.spark.TaskContext +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.UnsafeRow +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.datasources.{PartitionedFile, RecordReaderIterator} +import org.apache.spark.sql.execution.datasources.parquet._ +import org.apache.spark.sql.execution.datasources.v2._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.{InputPartition, PartitionReader} +import org.apache.spark.sql.types.{AtomicType, StructType} +import org.apache.spark.sql.vectorized.ColumnarBatch +import org.apache.spark.util.SerializableConfiguration + +/** + * A factory used to create Parquet readers. + * + * @param sqlConf SQL configuration. + * @param broadcastedConf Broadcast serializable Hadoop Configuration. + * @param dataSchema Schema of Parquet files. + * @param readDataSchema Required schema of Parquet files. + * @param partitionSchema Schema of partitions. + * @param filters Filters of the batch scan. + */ +case class ParquetPartitionReaderFactory( +sqlConf: SQLConf, +broadcastedConf: Broadcast[SerializableConfiguration], +dataSchema: StructType, +readDataSchema: StructType, +partitionSchema: StructType, +filters: Array[Filter]) extends FilePartitionReaderFactory with Logging { + private val isCaseSensitive = sqlConf.caseSensitiveAnalysis + private val resultSchema = StructType(partitionSchema.fields ++ readDataSchema.fields) + private val enableOffHeapColumnVector = sqlConf.offHeapColumnVectorEnabled + private val enableVectorizedReader: Boolean = sqlConf.parquetVectorizedReaderEnabled && +resultSchema.forall(_.dataType.isInstanceOf[AtomicType]) + private val enableRecordFilter: Boolean = sqlConf.parquetRecordFilterEnabled + private val timestampConversion: Boolean = sqlConf.isParquetINT96TimestampConversion + private val capacity = sqlConf.parquetVectorizedReaderBatchSize + private val enableParquetFilterPushDown: Boolean = sqlConf.parquetFilterPushDown + private val pushDownDate = sqlConf.parquetFilterPushDownDate + private val pushDownTimestamp = sqlConf.parquetFilterPushDownTimestamp + private val pushDownDecimal = sqlConf.parquetFilterPushDownDecimal + private val pushDownStringStartWith = sqlConf.parquetFilterPushDownStringStartWith + private val pushDownInFilterThreshold = sqlConf.parquetFilterPushDownInFilterThreshold + + override def supportColumnarReads(partition: InputPartition): Boolean = { +sqlConf.parquetVectorizedReaderEnabled && sqlConf.wholeStageEnabled && + resultSchema.length <= sqlConf.wholeStageMaxNumFields && + resultSchema.forall(_.dataType.isInstanceOf[AtomicType]) + } + + override def buildReader(file: PartitionedFile): PartitionReader[InternalRow] = { +val reader = if (enableVectorizedReader) { + createVectorizedReader(file) +} else { + createRowBaseReader(file) +} + +val fileReader = new PartitionReader[InternalRow] { + override def next(): Boolean =
[GitHub] [spark] AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497581305 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
SparkQA removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497567324 **[Test build #105989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105989/testReport)** for PR 24497 at commit [`da5e156`](https://github.com/apache/spark/commit/da5e15635f8b762d666a5fb5f0eb3a9bc7c13c6d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r289261505 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScan.scala ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.parquet + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.parquet.hadoop.ParquetInputFormat + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex +import org.apache.spark.sql.execution.datasources.parquet.{ParquetReadSupport, ParquetWriteSupport} +import org.apache.spark.sql.execution.datasources.v2.FileScan +import org.apache.spark.sql.execution.datasources.v2.orc.OrcScan +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.PartitionReaderFactory +import org.apache.spark.sql.types.StructType +import org.apache.spark.sql.util.CaseInsensitiveStringMap +import org.apache.spark.util.SerializableConfiguration + +case class ParquetScan( +sparkSession: SparkSession, +hadoopConf: Configuration, +fileIndex: PartitioningAwareFileIndex, +dataSchema: StructType, +readDataSchema: StructType, +readPartitionSchema: StructType, +filters: Array[Filter], +pushedFilters: Array[Filter], +options: CaseInsensitiveStringMap) + extends FileScan(sparkSession, fileIndex, readDataSchema, readPartitionSchema) { + override def isSplitable(path: Path): Boolean = true + + override def createReaderFactory(): PartitionReaderFactory = { +hadoopConf.set(ParquetInputFormat.READ_SUPPORT_CLASS, classOf[ParquetReadSupport].getName) +hadoopConf.set( + ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA, + readDataSchema.json) +hadoopConf.set( + ParquetWriteSupport.SPARK_ROW_SCHEMA, + readDataSchema.json) +hadoopConf.set( + SQLConf.SESSION_LOCAL_TIMEZONE.key, + sparkSession.sessionState.conf.sessionLocalTimeZone) +hadoopConf.setBoolean( + SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key, + sparkSession.sessionState.conf.nestedSchemaPruningEnabled) +hadoopConf.setBoolean( + SQLConf.CASE_SENSITIVE.key, + sparkSession.sessionState.conf.caseSensitiveAnalysis) + +ParquetWriteSupport.setSchema(readDataSchema, hadoopConf) + +// Sets flags for `ParquetToSparkSchemaConverter` +hadoopConf.setBoolean( + SQLConf.PARQUET_BINARY_AS_STRING.key, + sparkSession.sessionState.conf.isParquetBinaryAsString) +hadoopConf.setBoolean( + SQLConf.PARQUET_INT96_AS_TIMESTAMP.key, + sparkSession.sessionState.conf.isParquetINT96AsTimestamp) + +val broadcastedConf = sparkSession.sparkContext.broadcast( + new SerializableConfiguration(hadoopConf)) +ParquetPartitionReaderFactory(sparkSession.sessionState.conf, broadcastedConf, + dataSchema, readDataSchema, readPartitionSchema, filters) Review comment: This should be `pushedFilters` instead of `filter` since we already converted them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497581305 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497581308 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105989/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
SparkQA commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497581160 **[Test build #105989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105989/testReport)** for PR 24497 at commit [`da5e156`](https://github.com/apache/spark/commit/da5e15635f8b762d666a5fb5f0eb3a9bc7c13c6d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
HyukjinKwon commented on a change in pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#discussion_r289260986 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -442,3 +519,172 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { } } } + + +/** + * This object targets to integrate various UDF test cases so that Scalar UDF, Python UDF and + * Scalar Pandas UDFs can be tested in SBT & Maven tests. + * + * The available UDFs cast input to strings and take one column as input with a string type + * column as output. + * + * To register Scala UDF in SQL: + * {{{ + * IntegratedUDFTestUtils.registerTestUDF(new TestScalaUDF, spark) + * }}} + * + * To register Python UDF in SQL: + * {{{ + * IntegratedUDFTestUtils.registerTestUDF(new TestPythonUDF, spark) + * }}} + * + * To register Scalar Pandas UDF in SQL: + * {{{ + * IntegratedUDFTestUtils.registerTestUDF(new TestScalarPandasUDF, spark) + * }}} + * + * To use it in Scala API and SQL: + * {{{ + * sql("SELECT udf(1)") + * spark.select(expr("udf(1)") + * }}} + * + * They are currently registered as the name 'udf' in function registry. + */ +object IntegratedUDFTestUtils extends SQLHelper with Logging { + import scala.sys.process._ + + lazy val pythonExec: String = { +val pythonExec = sys.env.getOrElse("PYSPARK_PYTHON", "python3.6") Review comment: `python3.6` is for Jenkins. Just using `python` could be enough .. I should see if it runs correctly in Jenkins. We will likely deprecate Python 2 anyway This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497580308 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497580308 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497580311 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11243/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497580311 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11243/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r289260080 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScan.scala ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.parquet + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.parquet.hadoop.ParquetInputFormat + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex +import org.apache.spark.sql.execution.datasources.parquet.{ParquetReadSupport, ParquetWriteSupport} +import org.apache.spark.sql.execution.datasources.v2.FileScan +import org.apache.spark.sql.execution.datasources.v2.orc.OrcScan +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.PartitionReaderFactory +import org.apache.spark.sql.types.StructType +import org.apache.spark.sql.util.CaseInsensitiveStringMap +import org.apache.spark.util.SerializableConfiguration + +case class ParquetScan( +sparkSession: SparkSession, +hadoopConf: Configuration, +fileIndex: PartitioningAwareFileIndex, +dataSchema: StructType, +readDataSchema: StructType, +readPartitionSchema: StructType, +filters: Array[Filter], +pushedFilters: Array[Filter], +options: CaseInsensitiveStringMap) + extends FileScan(sparkSession, fileIndex, readDataSchema, readPartitionSchema) { + override def isSplitable(path: Path): Boolean = true + + override def createReaderFactory(): PartitionReaderFactory = { +hadoopConf.set(ParquetInputFormat.READ_SUPPORT_CLASS, classOf[ParquetReadSupport].getName) +hadoopConf.set( + ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA, + readDataSchema.json) +hadoopConf.set( + ParquetWriteSupport.SPARK_ROW_SCHEMA, + readDataSchema.json) Review comment: nit. Since we are making a new class, could you declare `val` for `readDataSchema.json` and reuse it at line 52 and 55? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497579062 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11242/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
AmplabJenkins removed a comment on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497579057 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
gengliangwang commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r289259910 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetLogRedirector.java ## @@ -25,11 +25,11 @@ // Redirects the JUL logging for parquet-mr versions <= 1.8 to SLF4J logging using // SLF4JBridgeHandler. Parquet-mr versions >= 1.9 use SLF4J directly -final class ParquetLogRedirector implements Serializable { +public final class ParquetLogRedirector implements Serializable { Review comment: > Spark uses Parquet >= 1.9. Is this still needed? I am not sure about this. I think we can resolve this in another Jira/PR. > Why was it made public? We need to make it public so that ParquetWriteBuilder can access it. As per the discussion in https://issues.apache.org/jira/browse/SPARK-16964, I think it is fine to do this in the `sql.execution` package This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
SparkQA commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497579378 **[Test build #105992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105992/testReport)** for PR 24752 at commit [`a377255`](https://github.com/apache/spark/commit/a3772558b5d50b037cf9f7a53c344c6c4aa123bc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
HyukjinKwon commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497579209 cc @BryanCutler, @cloud-fan, @icexelloss, @viirya, @gatorsmile, @ueshin, @wangyum, @dilipbiswal, @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497579062 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11242/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
AmplabJenkins commented on issue #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#issuecomment-497579057 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
HyukjinKwon commented on a change in pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752#discussion_r289259435 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -442,3 +519,172 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { } } } + + +/** + * This object targets to integrate various UDF test cases so that Scalar UDF, Python UDF and + * Scalar Pandas UDFs can be tested in SBT & Maven tests. + * + * The available UDFs cast input to strings and take one column as input with a string type + * column as output. + * + * To register Scala UDF in SQL: + * {{{ + * IntegratedUDFTestUtils.registerTestUDF(new TestScalaUDF, spark) + * }}} + * + * To register Python UDF in SQL: + * {{{ + * IntegratedUDFTestUtils.registerTestUDF(new TestPythonUDF, spark) + * }}} + * + * To register Scalar Pandas UDF in SQL: + * {{{ + * IntegratedUDFTestUtils.registerTestUDF(new TestScalarPandasUDF, spark) + * }}} + * + * To use it in Scala API and SQL: + * {{{ + * sql("SELECT udf(1)") + * spark.select(expr("udf(1)") + * }}} + * + * They are currently registered as the name 'udf' in function registry. + */ +object IntegratedUDFTestUtils extends SQLHelper with Logging { Review comment: Maybe this has to be moved somewhere else later to be used in Scala APIs too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files
HyukjinKwon opened a new pull request #24752: [SPARK-27893][SQL][PYTHON] Create an integrated test base for Python, Scalar Pandas, Scala UDF by sql files URL: https://github.com/apache/spark/pull/24752 ## What changes were proposed in this pull request? This PR targets to add an integrated test base for various UDF test cases so that Scalar UDF, Python UDF and Scalar Pandas UDFs can be tested in SBT & Maven tests. ### Problem One of the problems we face is that: `ExtractPythonUDF[s|FromAggregate]` has unevaluable expressions that always has to be wrapped with special plans. This special rule seems producing many issues, for instance, SPARK-27803, SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721. ### Why do we have less test cases dedicated for SQL and plans? We don't have such SQL (or plan) dedicated tests in PySpark to catch such issues because: - A developer should know both SQL, PySpark, Py4J and version differences in Python to write such good test cases - To test plans, we should access to plans in JVM via Py4J which is tricky, messy and duplicates JVM test cases - Usually we just add end-to-end test cases in PySpark therefore there are not so many examples to refer It is non-trivial overhead to switch test base and method (IMHO). ### How does this PR fix? This PR adds Python UDF and Scalar Pandas UDF in runtime of SBT / Maven test cases. It generates Python-pickled instance (consisting of return type and Python native function) that is used in Python or Scalar Pandas UDF and directly brings into JVM. After that, we don't interact via Py4J anymore but run the tests directly in JVM - we can just register and run Python UDF and Scalar Pandas UDF in JVM. Currently, I only integrated this change into SQL file based testing. This is how works with `udf-*.sql` files: After the test files starting with `udf-*.sql` are detected, it creates three test cases: - Scala UDF test case with a Scalar UDF registered named 'udf'. - Python UDF test case with a Python UDF registered named 'udf' iff Python executable and pyspark are available. - Scalar Pandas UDF test case with a Scalar Pandas UDF registered named 'udf' iff Python executable, pandas, pyspark and pyarrow are available. Therefore, UDF test cases should have single input and output files but executed by three different types of UDFs. For instance, ```sql CREATE TEMPORARY VIEW ta AS SELECT udf(a) AS a, udf('a') AS tag FROM t1 UNION ALL SELECT udf(a) AS a, udf('b') AS tag FROM t2; CREATE TEMPORARY VIEW tb AS SELECT udf(a) AS a, udf('a') AS tag FROM t3 UNION ALL SELECT udf(a) AS a, udf('b') AS tag FROM t4; SELECT tb.* FROM ta INNER JOIN tb ON ta.a = tb.a AND ta.tag = tb.tag; ``` will be ran 3 times with Scalar UDF, Python UDF and Scalar Pandas UDF each. ### Appendix Plus, this PR adds `IntegratedUDFTestUtils` which enables to test and execute Python UDF and Scalar Pandas UDFs as below: To register Python UDF in SQL: ```scala IntegratedUDFTestUtils.registerTestUDF(new TestPythonUDF, spark) ``` To register Scalar Pandas UDF in SQL: ```scala IntegratedUDFTestUtils.registerTestUDF(new TestPythonUDF, spark) ``` To use it in Scala API: ```scala spark.select(expr("udf(1)").show() ``` To use it in SQL: ```scala sql("SELECT udf(1)").show() ``` This util could be used in the future for better coverage with Scala API combinations as well. ## How was this patch tested? Tested via the command below: ```bash build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-inner-join.sql" ``` ``` [info] SQLQueryTestSuite: [info] - udf/udf-inner-join.sql - Scala UDF (5 seconds, 47 milliseconds) [info] - udf/udf-inner-join.sql - Python UDF (4 seconds, 335 milliseconds) [info] - udf/udf-inner-join.sql - Scalar Pandas UDF (5 seconds, 423 milliseconds) ``` [python] unavailable: ``` [info] SQLQueryTestSuite: [info] - udf/udf-inner-join.sql - Scala UDF (4 seconds, 577 milliseconds) [info] - udf/udf-inner-join.sql - Python UDF is skipped because [pyton] and/or pyspark were not available. !!! IGNORED !!! [info] - udf/udf-inner-join.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [pyton]. !!! IGNORED !!! ``` pyspark unavailable: ``` [info] SQLQueryTestSuite: [info] - udf/udf-inner-join.sql - Scala UDF (4 seconds, 991 milliseconds) [info] - udf/udf-inner-join.sql - Python UDF is skipped because [python] and/or pyspark were not available. !!! IGNORED !!! [info] - udf/udf-inner-join.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python]. !!! IGNORED !!! ``` pandas
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r289259062 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ## @@ -238,7 +240,8 @@ case class AlterTableAddColumnsCommand( // TextFileFormat only default to one column "value" // Hive type is already considered as hive serde table, so the logic will not // come in here. -case _: JsonFileFormat | _: CSVDataSourceV2 | _: ParquetFileFormat | _: OrcDataSourceV2 => +case _: JsonFileFormat | _: CSVFileFormat | _: ParquetFileFormat => Review comment: Could you make this another small PR since this is Parquet migration PR? Also, it would be great if the PR has a test coverage for this missing part. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
dongjoon-hyun commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r289259062 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ## @@ -238,7 +240,8 @@ case class AlterTableAddColumnsCommand( // TextFileFormat only default to one column "value" // Hive type is already considered as hive serde table, so the logic will not // come in here. -case _: JsonFileFormat | _: CSVDataSourceV2 | _: ParquetFileFormat | _: OrcDataSourceV2 => +case _: JsonFileFormat | _: CSVFileFormat | _: ParquetFileFormat => Review comment: Could you make this another small PR since this is Parquet migration PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
AmplabJenkins removed a comment on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency URL: https://github.com/apache/spark/pull/24751#issuecomment-497577723 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11241/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
AmplabJenkins removed a comment on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency URL: https://github.com/apache/spark/pull/24751#issuecomment-497577720 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
AmplabJenkins commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency URL: https://github.com/apache/spark/pull/24751#issuecomment-497577723 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11241/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
AmplabJenkins commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency URL: https://github.com/apache/spark/pull/24751#issuecomment-497577720 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#discussion_r289258283 ## File path: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala ## @@ -278,6 +278,31 @@ class MesosClusterSchedulerSuite extends SparkFunSuite with LocalSparkContext wi assert(uris.asScala.forall(_.getCache)) } + test("SPARK-26082 supports setting fetcher cache in the submission") { Review comment: ``` - test("SPARK-26082 supports setting fetcher cache in the submission") { + test("SPARK-26192 supports setting fetcher cache in the submission") { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
SparkQA commented on issue #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency URL: https://github.com/apache/spark/pull/24751#issuecomment-497576728 **[Test build #105991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105991/testReport)** for PR 24751 at commit [`addb908`](https://github.com/apache/spark/commit/addb9087b34bfb83aec9c300f473b88a08b670d9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency
wangyum opened a new pull request #24751: [SPARK-27831][SQL][TEST][test-maven] Move Hive test jars to maven dependency URL: https://github.com/apache/spark/pull/24751 ## What changes were proposed in this pull request? This pr moves Hive test jars(`hive-contrib-0.13.1.jar`, `hive-hcatalog-core-0.13.1.jar`, `hive-contrib-2.3.5.jar` and `hive-hcatalog-core-2.3.5.jar`) to maven dependency. ## How was this patch tested? Existing test Please note that this pr need test with `maven` and `sbt`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL
cloud-fan commented on a change in pull request #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL URL: https://github.com/apache/spark/pull/24706#discussion_r289256656 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -0,0 +1,380 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.adaptive + +import java.util +import java.util.concurrent.LinkedBlockingQueue + +import scala.collection.JavaConverters._ +import scala.collection.concurrent.TrieMap +import scala.collection.mutable +import scala.concurrent.ExecutionContext + +import org.apache.spark.SparkException +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, ReturnAnswer} +import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor} +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.exchange._ +import org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecutionUpdate +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ThreadUtils + +/** + * A root node to execute the query plan adaptively. It splits the query plan into independent + * stages and executes them in order according to their dependencies. The query stage + * materializes its output at the end. When one stage completes, the data statistics of the + * materialized output will be used to optimize the remainder of the query. + * + * To create query stages, we traverse the query tree bottom up. When we hit an exchange node, + * and if all the child query stages of this exchange node are materialized, we create a new + * query stage for this exchange node. The new stage is then materialized asynchronously once it + * is created. + * + * When one query stage finishes materialization, the rest query is re-optimized and planned based + * on the latest statistics provided by all materialized stages. Then we traverse the query plan + * again and create more stages if possible. After all stages have been materialized, we execute + * the rest of the plan. + */ +case class AdaptiveSparkPlanExec( +initialPlan: SparkPlan, +session: SparkSession, +subqueryMap: Map[Long, ExecSubqueryExpression], +stageCache: TrieMap[SparkPlan, QueryStageExec]) + extends LeafExecNode { + + def executedPlan: SparkPlan = currentPhysicalPlan + + override def conf: SQLConf = session.sessionState.conf + + override def output: Seq[Attribute] = initialPlan.output + + override def doCanonicalize(): SparkPlan = initialPlan.canonicalized + + override def doExecute(): RDD[InternalRow] = lock.synchronized { +var currentLogicalPlan = currentPhysicalPlan.logicalLink.get +var result = createQueryStages(currentPhysicalPlan) +val events = new LinkedBlockingQueue[StageMaterializationEvent]() +val errors = new mutable.ArrayBuffer[SparkException]() +while (!result.allChildStagesMaterialized) { + currentPhysicalPlan = result.newPlan + currentLogicalPlan = updateLogicalPlan(currentLogicalPlan, result.newStages) + currentPhysicalPlan.setTagValue(SparkPlan.LOGICAL_PLAN_TAG, currentLogicalPlan) + onUpdatePlan() + + // Start materialization of all new stages. + result.newStages.map(_._2).foreach { stage => +stage.materialize().onComplete { res => + if (res.isSuccess) { +events.offer(StageSuccess(stage, res.get)) + } else { +events.offer(StageFailure(stage, res.failed.get)) + } +}(AdaptiveSparkPlanExec.executionContext) + } + + // Wait on the next completed stage, which indicates new stats are available and probably + // new stages can be created. There might be other stages that finish at around the same + // time, so we process those stages too in order to reduce re-planning. + val nextMsg = events.take() + val rem = new util.ArrayList[StageMaterializationEvent]() + events.drainTo(rem) +
[GitHub] [spark] William1104 commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring
William1104 commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring URL: https://github.com/apache/spark/pull/24747#discussion_r289256350 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ## @@ -255,59 +255,65 @@ private[sql] trait SQLTestUtilsBase * Drops temporary view `viewNames` after calling `f`. */ protected def withTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // temp views that never got created. - try viewNames.foreach(spark.catalog.dropTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinally(f)(viewNames.foreach(spark.catalog.dropTempView)) } /** * Drops global temporary view `viewNames` after calling `f`. */ protected def withGlobalTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // global temp views that never got created. - try viewNames.foreach(spark.catalog.dropGlobalTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinally(f)(viewNames.foreach(spark.catalog.dropGlobalTempView)) } /** * Drops table `tableName` after calling `f`. */ protected def withTable(tableNames: String*)(f: => Unit): Unit = { -try f finally { - tableNames.foreach { name => -spark.sql(s"DROP TABLE IF EXISTS $name") - } -} +tryWithFinally(f)(tableNames.foreach { name => + spark.sql(s"DROP TABLE IF EXISTS $name") +}) } /** * Drops view `viewName` after calling `f`. */ protected def withView(viewNames: String*)(f: => Unit): Unit = { -try f finally { +tryWithFinally(f)( viewNames.foreach { name => spark.sql(s"DROP VIEW IF EXISTS $name") } -} +) } /** * Drops cache `cacheName` after calling `f`. */ protected def withCache(cacheNames: String*)(f: => Unit): Unit = { -try f finally { - cacheNames.foreach { cacheName => -try uncacheTable(cacheName) catch { - case _: AnalysisException => +tryWithFinally(f)(cacheNames.foreach(uncacheTable)) + } + + /** + * Executes the given tryBlock and then the given finallyBlock no matter whether tryBlock throws + * an exception. If both tryBlock and finallyBlock throw exceptions, the exception thrown + * from the finallyBlock with be added to the exception thrown from tryBlock as a + * suppress exception. It helps to avoid masking the exception from tryBlock with exception + * from finallyBlock + */ + private def tryWithFinally(tryBlock: => Unit)(finallyBlock: => Unit): Unit = { Review comment: You are right. They looks almost exactly the same. And this function was in Spark four years already.. I didn’t do enough research on what we already have in Spark... The function I created is redundant. Even the name looks very similar. I will update the test to reuse this Utils.tryWithSafeFinally in SQLTestUtils. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497573582 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105990/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497573581 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497573582 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105990/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497573581 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
SparkQA removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497570801 **[Test build #105990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105990/testReport)** for PR 24750 at commit [`acbace0`](https://github.com/apache/spark/commit/acbace036f05f554221d45f7138c5a2861c90d5e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
SparkQA commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497573540 **[Test build #105990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105990/testReport)** for PR 24750 at commit [`acbace0`](https://github.com/apache/spark/commit/acbace036f05f554221d45f7138c5a2861c90d5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#discussion_r289253607 ## File path: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala ## @@ -430,6 +429,9 @@ private[spark] class MesosClusterScheduler( } private def getDriverUris(desc: MesosDriverDescription): List[CommandInfo.URI] = { +val useFetchCache = conf.getBoolean("spark.mesos.fetcherCache.enable", false) || + desc.conf.getBoolean("spark.mesos.fetcherCache.enable", false) Review comment: Although this is effectively the same, could you preserve the order of boolean expressions to be consistent with the master branch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#discussion_r289253404 ## File path: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala ## @@ -430,6 +429,9 @@ private[spark] class MesosClusterScheduler( } private def getDriverUris(desc: MesosDriverDescription): List[CommandInfo.URI] = { +val useFetchCache = conf.getBoolean("spark.mesos.fetcherCache.enable", false) || + desc.conf.getBoolean("spark.mesos.fetcherCache.enable", false) Review comment: Ur, this one looks like the opposite direction. Could you check this again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#discussion_r289253607 ## File path: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala ## @@ -430,6 +429,9 @@ private[spark] class MesosClusterScheduler( } private def getDriverUris(desc: MesosDriverDescription): List[CommandInfo.URI] = { +val useFetchCache = conf.getBoolean("spark.mesos.fetcherCache.enable", false) || + desc.conf.getBoolean("spark.mesos.fetcherCache.enable", false) Review comment: Although this is effectively same, could you preserve the order of boolean expressions to be consistent? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
dongjoon-hyun commented on a change in pull request #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#discussion_r289253404 ## File path: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala ## @@ -430,6 +429,9 @@ private[spark] class MesosClusterScheduler( } private def getDriverUris(desc: MesosDriverDescription): List[CommandInfo.URI] = { +val useFetchCache = conf.getBoolean("spark.mesos.fetcherCache.enable", false) || + desc.conf.getBoolean("spark.mesos.fetcherCache.enable", false) Review comment: Ur, this one looks the opposite direction. Could you check this again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
SparkQA commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497570801 **[Test build #105990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105990/testReport)** for PR 24750 at commit [`acbace0`](https://github.com/apache/spark/commit/acbace036f05f554221d45f7138c5a2861c90d5e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497570447 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11240/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497570445 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497570445 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24750#issuecomment-497570447 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11240/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24750: [SPARK-26192][MESOS][2.4]
dongjoon-hyun commented on issue #24750: [SPARK-26192][MESOS][2.4] URL: https://github.com/apache/spark/pull/24750#issuecomment-497570054 Thank you for making a new PR, @mwlon . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]
AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] URL: https://github.com/apache/spark/pull/24750#issuecomment-497569432 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring
dongjoon-hyun commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring URL: https://github.com/apache/spark/pull/24747#discussion_r289252522 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ## @@ -255,59 +255,65 @@ private[sql] trait SQLTestUtilsBase * Drops temporary view `viewNames` after calling `f`. */ protected def withTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // temp views that never got created. - try viewNames.foreach(spark.catalog.dropTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinally(f)(viewNames.foreach(spark.catalog.dropTempView)) } /** * Drops global temporary view `viewNames` after calling `f`. */ protected def withGlobalTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // global temp views that never got created. - try viewNames.foreach(spark.catalog.dropGlobalTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinally(f)(viewNames.foreach(spark.catalog.dropGlobalTempView)) } /** * Drops table `tableName` after calling `f`. */ protected def withTable(tableNames: String*)(f: => Unit): Unit = { -try f finally { - tableNames.foreach { name => -spark.sql(s"DROP TABLE IF EXISTS $name") - } -} +tryWithFinally(f)(tableNames.foreach { name => + spark.sql(s"DROP TABLE IF EXISTS $name") +}) } /** * Drops view `viewName` after calling `f`. */ protected def withView(viewNames: String*)(f: => Unit): Unit = { -try f finally { +tryWithFinally(f)( viewNames.foreach { name => spark.sql(s"DROP VIEW IF EXISTS $name") } -} +) } /** * Drops cache `cacheName` after calling `f`. */ protected def withCache(cacheNames: String*)(f: => Unit): Unit = { -try f finally { - cacheNames.foreach { cacheName => -try uncacheTable(cacheName) catch { - case _: AnalysisException => +tryWithFinally(f)(cacheNames.foreach(uncacheTable)) + } + + /** + * Executes the given tryBlock and then the given finallyBlock no matter whether tryBlock throws + * an exception. If both tryBlock and finallyBlock throw exceptions, the exception thrown + * from the finallyBlock with be added to the exception thrown from tryBlock as a + * suppress exception. It helps to avoid masking the exception from tryBlock with exception + * from finallyBlock + */ + private def tryWithFinally(tryBlock: => Unit)(finallyBlock: => Unit): Unit = { Review comment: Unfortunately, the proposed one looks like the existing [tryWithSafeFinally](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1347)? Does this has a new feature compared to the existing `tryWithSafeFinally`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24750: [SPARK-26192][MESOS][2.4]
dongjoon-hyun commented on issue #24750: [SPARK-26192][MESOS][2.4] URL: https://github.com/apache/spark/pull/24750#issuecomment-497569944 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)
BryanCutler commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-497569777 >BTW, is the buffer size 65536 bytes? So .. the issue is that we should wait until 65536 bytes is full? Why don't we simply add a config to control the buffer size then? Yes, I think this is the right approach if there is too much latency between endpoints. There is a config for the Scala side `spark.buffer.size` already This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]
AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] URL: https://github.com/apache/spark/pull/24750#issuecomment-497569126 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring
dongjoon-hyun commented on a change in pull request #24747: [SPARK-27772][SQL][TEST] SQLTestUtils Refactoring URL: https://github.com/apache/spark/pull/24747#discussion_r289252522 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ## @@ -255,59 +255,65 @@ private[sql] trait SQLTestUtilsBase * Drops temporary view `viewNames` after calling `f`. */ protected def withTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // temp views that never got created. - try viewNames.foreach(spark.catalog.dropTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinally(f)(viewNames.foreach(spark.catalog.dropTempView)) } /** * Drops global temporary view `viewNames` after calling `f`. */ protected def withGlobalTempView(viewNames: String*)(f: => Unit): Unit = { -try f finally { - // If the test failed part way, we don't want to mask the failure by failing to remove - // global temp views that never got created. - try viewNames.foreach(spark.catalog.dropGlobalTempView) catch { -case _: NoSuchTableException => - } -} +tryWithFinally(f)(viewNames.foreach(spark.catalog.dropGlobalTempView)) } /** * Drops table `tableName` after calling `f`. */ protected def withTable(tableNames: String*)(f: => Unit): Unit = { -try f finally { - tableNames.foreach { name => -spark.sql(s"DROP TABLE IF EXISTS $name") - } -} +tryWithFinally(f)(tableNames.foreach { name => + spark.sql(s"DROP TABLE IF EXISTS $name") +}) } /** * Drops view `viewName` after calling `f`. */ protected def withView(viewNames: String*)(f: => Unit): Unit = { -try f finally { +tryWithFinally(f)( viewNames.foreach { name => spark.sql(s"DROP VIEW IF EXISTS $name") } -} +) } /** * Drops cache `cacheName` after calling `f`. */ protected def withCache(cacheNames: String*)(f: => Unit): Unit = { -try f finally { - cacheNames.foreach { cacheName => -try uncacheTable(cacheName) catch { - case _: AnalysisException => +tryWithFinally(f)(cacheNames.foreach(uncacheTable)) + } + + /** + * Executes the given tryBlock and then the given finallyBlock no matter whether tryBlock throws + * an exception. If both tryBlock and finallyBlock throw exceptions, the exception thrown + * from the finallyBlock with be added to the exception thrown from tryBlock as a + * suppress exception. It helps to avoid masking the exception from tryBlock with exception + * from finallyBlock + */ + private def tryWithFinally(tryBlock: => Unit)(finallyBlock: => Unit): Unit = { Review comment: Unfortunately, the proposed one looks like with the existing [tryWithSafeFinally](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1347)? Does this has a new feature compared to the existing `tryWithSafeFinally`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]
AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] URL: https://github.com/apache/spark/pull/24750#issuecomment-497569432 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4]
AmplabJenkins removed a comment on issue #24750: [SPARK-26192][MESOS][2.4] URL: https://github.com/apache/spark/pull/24750#issuecomment-497569044 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]
AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] URL: https://github.com/apache/spark/pull/24750#issuecomment-497569126 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4]
AmplabJenkins commented on issue #24750: [SPARK-26192][MESOS][2.4] URL: https://github.com/apache/spark/pull/24750#issuecomment-497569044 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mwlon opened a new pull request #24750: functional changes for SPARK-26192
mwlon opened a new pull request #24750: functional changes for SPARK-26192 URL: https://github.com/apache/spark/pull/24750 ## What changes were proposed in this pull request? Let Mesos fetcher cache option come from submission as well as dispatcher ## How was this patch tested? Existing unit tests and a new one This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497568162 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
AmplabJenkins removed a comment on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497568167 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11239/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497568162 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
AmplabJenkins commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497568167 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11239/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling
AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling URL: https://github.com/apache/spark/pull/24374#issuecomment-497567799 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling
AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling URL: https://github.com/apache/spark/pull/24374#issuecomment-497567806 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105986/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling
AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling URL: https://github.com/apache/spark/pull/24374#issuecomment-497567806 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105986/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling
AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling URL: https://github.com/apache/spark/pull/24374#issuecomment-497567799 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations
felixcheung commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations URL: https://github.com/apache/spark/pull/24700#issuecomment-497567724 ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling
SparkQA commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling URL: https://github.com/apache/spark/pull/24374#issuecomment-497567530 **[Test build #105986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105986/testReport)** for PR 24374 at commit [`a3c1db5`](https://github.com/apache/spark/commit/a3c1db51da4e6f426aa978faff9449f331f56dec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling
SparkQA removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling URL: https://github.com/apache/spark/pull/24374#issuecomment-497548036 **[Test build #105986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105986/testReport)** for PR 24374 at commit [`a3c1db5`](https://github.com/apache/spark/commit/a3c1db51da4e6f426aa978faff9449f331f56dec). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24695: [SPARK-27831][SQL][TEST][test-hadoop3.2] Move Hive test jars to maven dependency
dongjoon-hyun commented on issue #24695: [SPARK-27831][SQL][TEST][test-hadoop3.2] Move Hive test jars to maven dependency URL: https://github.com/apache/spark/pull/24695#issuecomment-497567318 Thank you, @wangyum. Yes, please make a PR with that approach, @wangyum . At this time, let's test with all combinations. Last time, we tested SBT(hadoop-2.7/hadoop-3.2) only. @srowen . Unfortunately, this patch is already reverted from the master, and it's verified in another PR (here, https://github.com/apache/spark/pull/24745#pullrequestreview-244020056) to pass the Maven. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105971/ The last Jenkins passed the Hive UT but failed at some YARN UTs. I guess the next run will pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages
SparkQA commented on issue #24497: [SPARK-27630][CORE] Properly handle task end events from completed stages URL: https://github.com/apache/spark/pull/24497#issuecomment-497567324 **[Test build #105989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105989/testReport)** for PR 24497 at commit [`da5e156`](https://github.com/apache/spark/commit/da5e15635f8b762d666a5fb5f0eb3a9bc7c13c6d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on issue #24731: [SPARK-27725][EXAMPLES] Add an example discovery Script for GPU resources
felixcheung commented on issue #24731: [SPARK-27725][EXAMPLES] Add an example discovery Script for GPU resources URL: https://github.com/apache/spark/pull/24731#issuecomment-497566753 script sounds good This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block
turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block URL: https://github.com/apache/spark/pull/24740#issuecomment-497559772 > You shouldn't have this limit anymore (from spark 2.4 onwards) as long as you're also running a recent shuffle service. this uses fetching shuffle blocks to disk, instead of memory, which should be enabled by default for large blocks (https://issues.apache.org/jira/browse/SPARK-24297). > > But if you're seeing a failure with that, can you share some more details? Thanks. I see this failure with spark-2.3.2. I'm sorry for that I did not notice your PR, which set maxRemoteBlockSizeFetchToMem to a value little than 2GB. But when resources are available, it is a good idea that we can set this value large than 2GB to reduce the I/O overhead. So, shall we support the shuffle data transmission no matter the shuffle blocks' size when maxRemoteBlockSizeFetchToMem is large than 2GB? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block
turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block URL: https://github.com/apache/spark/pull/24740#issuecomment-497559772 > You shouldn't have this limit anymore (from spark 2.4 onwards) as long as you're also running a recent shuffle service. this uses fetching shuffle blocks to disk, instead of memory, which should be enabled by default for large blocks (https://issues.apache.org/jira/browse/SPARK-24297). > > But if you're seeing a failure with that, can you share some more details? Thanks. I see this failure with spark-2.3.2. I'm sorry for that I did not notice your PR, which set maxRemoteBlockSizeFetchToMem to a value little than 2GB. But when resources are available, it is a good idea that we can set this value large than 2GB to reduce the I/O overhead. So, shall we support the shuffle data transmission when maxRemoteBlockSizeFetchToMem is large than 2GB? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block
turboFei edited a comment on issue #24740: [SPARK-27876][CORE] Split large shuffle partition to multi-segments to enable transfer oversize shuffle partition block URL: https://github.com/apache/spark/pull/24740#issuecomment-497559772 > You shouldn't have this limit anymore (from spark 2.4 onwards) as long as you're also running a recent shuffle service. this uses fetching shuffle blocks to disk, instead of memory, which should be enabled by default for large blocks (https://issues.apache.org/jira/browse/SPARK-24297). > > But if you're seeing a failure with that, can you share some more details? Thanks. I see this failure with spark-2.3.2. I'm sorry for that I did not notice your PR, which set maxRemoteBlockSizeFetchToMem to a value little than 2GB. But when resources are available, it is a good idea that we can set this value large than 2GB to reduce the I/O overhead. So, shall we support the shuffle data transmission when maxRemoteBlockSizeFetchToMem is large than 2GB? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling
AmplabJenkins commented on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling URL: https://github.com/apache/spark/pull/24374#issuecomment-497562069 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org