[GitHub] [spark] gaborgsomogyi commented on a change in pull request #30389: [SPARK-33143][PYTHON] Add configurable timeout to python server and client
gaborgsomogyi commented on a change in pull request #30389: URL: https://github.com/apache/spark/pull/30389#discussion_r525876645 ## File path: core/src/test/scala/org/apache/spark/api/python/PythonRDDSuite.scala ## @@ -76,12 +79,22 @@ class PythonRDDSuite extends SparkFunSuite with LocalSparkContext { } test("python server error handling") { -val authHelper = new SocketAuthHelper(new SparkConf()) -val errorServer = new ExceptionPythonServer(authHelper) -val client = new Socket(InetAddress.getLoopbackAddress(), errorServer.port) -authHelper.authToServer(client) -val ex = intercept[Exception] { errorServer.getResult(Duration(1, "second")) } -assert(ex.getCause().getMessage().contains("exception within handleConnection")) +val savedSparkEnv = SparkEnv.get +try { + val conf = new SparkConf() + val env = mock(classOf[SparkEnv]) + doReturn(conf).when(env).conf + SparkEnv.set(env) Review comment: It's not testing the newly added code in any way. The test blowed-up with NPE because `SparkEnv.get` introduced in the server side code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30406: [SPARK-33473][SQL] Extend interpreted subexpression elimination to other interpreted projections
SparkQA commented on pull request #30406: URL: https://github.com/apache/spark/pull/30406#issuecomment-729502912 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35861/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on pull request #30336: [SPARK-33287][SS][UI]Expose state custom metrics information on SS UI
gaborgsomogyi commented on pull request #30336: URL: https://github.com/apache/spark/pull/30336#issuecomment-729502108 > O.K, it seems to be better to document how to get the list of custom metrics. I'll add it to the doc of the new parameter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #30336: [SPARK-33287][SS][UI]Expose state custom metrics information on SS UI
gaborgsomogyi commented on a change in pull request #30336: URL: https://github.com/apache/spark/pull/30336#discussion_r525874550 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala ## @@ -236,12 +239,61 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab) {graphUIDataForNumRowsDroppedByWatermark.generateTimelineHtml(jsCollector)} {graphUIDataForNumRowsDroppedByWatermark.generateHistogramHtml(jsCollector)} - // scalastyle:on +// scalastyle:on + + result ++= generateAggregatedCustomMetrics(query, minBatchTime, maxBatchTime, jsCollector) + result } else { new NodeBuffer() } } + def generateAggregatedCustomMetrics( + query: StreamingQueryUIData, + minBatchTime: Long, + maxBatchTime: Long, + jsCollector: JsCollector): NodeBuffer = { +val result: NodeBuffer = new NodeBuffer + +// This is made sure on caller side but put it here to be defensive +require(query.lastProgress.stateOperators.nonEmpty) +val enabledCustomMetrics = parent.parent.conf.get(ENABLED_STREAMING_UI_CUSTOM_METRIC_LIST) +logDebug(s"Enabled custom metrics: $enabledCustomMetrics") +query.lastProgress.stateOperators.head.customMetrics.keySet().asScala + .filter(enabledCustomMetrics.contains(_)).map { metricName => +val data = query.recentProgress.map(p => (parseProgressTimestamp(p.timestamp), + p.stateOperators.map(_.customMetrics.get(metricName).toDouble).sum)) +val max = data.maxBy(_._2)._2 + +val graphUIData = + new GraphUIData( +s"aggregated-$metricName-timeline", +s"aggregated-$metricName-histogram", +data, +minBatchTime, +maxBatchTime, +0, +max, +"") Review comment: Since we're not calling `init` it doesn't play as you've mentioned. The restriction is not to put anything nasty in the constructor. I can live with it just wanted to highlight. Since I don't feel objection from @HeartSaVioR I'm starting to add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
gengliangwang commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-729501113 I'm sorry that I was forcing on other tasks and couldn't follow this thread. Thanks for the great work, @cchighman ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30373: [SPARK-33472][SQL] Adjust RemoveRedundantSorts rule order
SparkQA commented on pull request #30373: URL: https://github.com/apache/spark/pull/30373#issuecomment-729501057 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35859/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30373: [SPARK-33472][SQL] Adjust RemoveRedundantSorts rule order
AmplabJenkins commented on pull request #30373: URL: https://github.com/apache/spark/pull/30373#issuecomment-729501074 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffl
AmplabJenkins commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729500060 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv
SparkQA commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729500038 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35858/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30406: [SPARK-33473][SQL] Extend interpreted subexpression elimination to other interpreted projections
SparkQA commented on pull request #30406: URL: https://github.com/apache/spark/pull/30406#issuecomment-729499173 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35860/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffl
AmplabJenkins commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729494363 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv
SparkQA commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729494347 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35857/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuzikun2003 commented on a change in pull request #29725: [SPARK-32096][SQL] Improve sorting performance of Spark SQL window function by removing window partition key from sort order
xuzikun2003 commented on a change in pull request #29725: URL: https://github.com/apache/spark/pull/29725#discussion_r525862140 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowWindowSorter.java ## @@ -0,0 +1,453 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution; + +import com.google.common.annotations.VisibleForTesting; + +import java.util.Comparator; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.Map.Entry; +import java.util.Queue; +import java.util.TreeMap; +import java.io.IOException; +import java.util.function.Supplier; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import scala.collection.Iterator; +import scala.math.Ordering; + +import org.apache.spark.memory.SparkOutOfMemoryError; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection; +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.sql.types.StructType; +import org.apache.spark.util.collection.unsafe.sort.PrefixComparator; +import org.apache.spark.util.collection.unsafe.sort.RecordComparator; + +public final class UnsafeExternalRowWindowSorter extends AbstractUnsafeExternalRowSorter { + + private static final Logger logger = LoggerFactory.getLogger(UnsafeExternalRowWindowSorter.class); + + private final StructType schema; + private final UnsafeProjection partitionSpecProjection; + private final Ordering orderingOfPartitionKey; + private final Ordering orderingInWindow; + private final Ordering orderingAcrossWindows; + private final PrefixComparator prefixComparatorInWindow; + private final UnsafeExternalRowSorter.PrefixComputer prefixComputerInWindow; + private final boolean canUseRadixSortInWindow; + private final long pageSizeBytes; + private static final int windowSorterMapMaxSize = 1; Review comment: @maropu , @opensky142857, here are the reasons for why we set the windowSorterMapMaxSize to be 1 and why we reduce the page size of each sorter. Each UnsafeExternalRowSorter is using a different memory consumer. Whenever you insert the first row into an UnsafeExternalRowSorter, the memory consumer of this sorter will allocate a whole page to the sorter. In our perf run of TPCDS100TB, the default page size is 64MB. If we insert only a few rows into a sorter corresponding to a window, then a lot of memory resources are wasted and the non-necessary memory allocation also brings significant performance overhead. So that is why we do two things in this PR: 1. Keep the number of window sorters small 2. Decrease the page size of each window sorter. To address this problem, actually we have two directions to go. One direction is that we can let these window sorters share the same memory consumer. Thus we won't allocate many big pages to which very few rows are inserted. But this direction requires a lot of engineer effort the refactor the code of UnsafeExternalSorter. The second direction is that we only keep one window sorter for each physical partition. Here is why we choose the second direction. When we run TPCDS100TB, we are not seeing Spark engine is slow in sorting many windows in a physical partition. We are seeing Spark engine is slow in sorting a single window in a single physical partition (q67 is the case), and the executor is doing a lot of unnecessary comparisons on the window partition key. To address the slowness that we observe, we follow the second direction to keep only one window sorter for each physical partition. And this single window sorter in each physical partition does not need to compare the window partition key and thus it runs almost 2 times faster. Perhaps I can rename these parameters to avoid confusion. How do you guys think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [spark] xuzikun2003 commented on a change in pull request #29725: [SPARK-32096][SQL] Improve sorting performance of Spark SQL window function by removing window partition key from sort order
xuzikun2003 commented on a change in pull request #29725: URL: https://github.com/apache/spark/pull/29725#discussion_r525862140 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowWindowSorter.java ## @@ -0,0 +1,453 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution; + +import com.google.common.annotations.VisibleForTesting; + +import java.util.Comparator; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.Map.Entry; +import java.util.Queue; +import java.util.TreeMap; +import java.io.IOException; +import java.util.function.Supplier; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import scala.collection.Iterator; +import scala.math.Ordering; + +import org.apache.spark.memory.SparkOutOfMemoryError; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection; +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.sql.types.StructType; +import org.apache.spark.util.collection.unsafe.sort.PrefixComparator; +import org.apache.spark.util.collection.unsafe.sort.RecordComparator; + +public final class UnsafeExternalRowWindowSorter extends AbstractUnsafeExternalRowSorter { + + private static final Logger logger = LoggerFactory.getLogger(UnsafeExternalRowWindowSorter.class); + + private final StructType schema; + private final UnsafeProjection partitionSpecProjection; + private final Ordering orderingOfPartitionKey; + private final Ordering orderingInWindow; + private final Ordering orderingAcrossWindows; + private final PrefixComparator prefixComparatorInWindow; + private final UnsafeExternalRowSorter.PrefixComputer prefixComputerInWindow; + private final boolean canUseRadixSortInWindow; + private final long pageSizeBytes; + private static final int windowSorterMapMaxSize = 1; Review comment: @maropu , @opensky142857, here are the reasons for why we set the windowSorterMapMaxSize to be 1 and why we reduce the page size of each sorter. Each UnsafeExternalRowSorter is using a different memory consumer. Whenever you insert the first row into an UnsafeExternalRowSorter, the memory consumer of this sorter will allocate a whole page to the sorter. In our perf run of TPCDS100TB, the default page size is 64MB. If we insert only a few rows into a sorter corresponding to a window, then a lot of memory resources are wasted and the non-necessary memory allocation also brings significant performance overhead. So that is why we do two things in this PR: 1. Keep the number of window sorters small 2. Decrease the page size of each window sorter. To address this problem, actually we have two directions to go. One direction is that we can let these window sorters share the same memory consumer. Thus we won't allocate many big pages to which very few rows are inserted. But this direction requires a lot of engineer effort the refactor the code of UnsafeExternalSorter. The second direction is that we only keep one window sorter for each physical partition. Here is why we choose the second direction. When we run TPCDS100TB, we are not seeing Spark engine is slow in sorting many windows in a physical partition. We are seeing Spark engine is slow in sorting a single window in a single physical partition (q67 is the case), and the executor is doing a lot of unnecessary comparisons on the window partition key. To address the slowness that we observe, we follow the second direction to keep only one window sorter for each physical partition. And this single window sorter in each physical partition does not need to compare the window partition key. Perhaps I can rename these parameters to avoid confusion. How do you guys think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [spark] xuzikun2003 commented on a change in pull request #29725: [SPARK-32096][SQL] Improve sorting performance of Spark SQL window function by removing window partition key from sort order
xuzikun2003 commented on a change in pull request #29725: URL: https://github.com/apache/spark/pull/29725#discussion_r525862140 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowWindowSorter.java ## @@ -0,0 +1,453 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution; + +import com.google.common.annotations.VisibleForTesting; + +import java.util.Comparator; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.Map.Entry; +import java.util.Queue; +import java.util.TreeMap; +import java.io.IOException; +import java.util.function.Supplier; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import scala.collection.Iterator; +import scala.math.Ordering; + +import org.apache.spark.memory.SparkOutOfMemoryError; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection; +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.sql.types.StructType; +import org.apache.spark.util.collection.unsafe.sort.PrefixComparator; +import org.apache.spark.util.collection.unsafe.sort.RecordComparator; + +public final class UnsafeExternalRowWindowSorter extends AbstractUnsafeExternalRowSorter { + + private static final Logger logger = LoggerFactory.getLogger(UnsafeExternalRowWindowSorter.class); + + private final StructType schema; + private final UnsafeProjection partitionSpecProjection; + private final Ordering orderingOfPartitionKey; + private final Ordering orderingInWindow; + private final Ordering orderingAcrossWindows; + private final PrefixComparator prefixComparatorInWindow; + private final UnsafeExternalRowSorter.PrefixComputer prefixComputerInWindow; + private final boolean canUseRadixSortInWindow; + private final long pageSizeBytes; + private static final int windowSorterMapMaxSize = 1; Review comment: @maropu , @opensky142857, here are the reasons for why we set the windowSorterMapMaxSize to be 1 and why we reduce the page size of each sorter. Each UnsafeExternalRowSorter is using a different memory consumer. Whenever you insert the first row into an UnsafeExternalRowSorter, the memory consumer of this sorter will allocate a whole page to the sorter. In our perf run of TPCDS100TB, the default page size is 64MB. If we insert only a few rows into a sorter corresponding to a window, then a lot of memory resources are wasted and the non-necessary memory allocation also brings significant overhead. So that is why we do two things in this PR: 1. Keep the number of window sorters small 2. Decrease the page size of each window sorter. To address this problem, actually we have two directions to go. One direction is that we can let these window sorters share the same memory consumer. Thus we won't allocate many big pages to which very few rows are inserted. But this direction requires a lot of engineer effort the refactor the code of UnsafeExternalSorter. The second direction is that we only keep one window sorter for each physical partition. Here is why we choose the second direction. When we run TPCDS100TB, we are not seeing Spark engine is slow in sorting many windows in a physical partition. We are seeing Spark engine is slow in sorting a single window in a single physical partition (q67 is the case), and the executor is doing a lot of unnecessary comparisons on the window partition key. To address the slowness that we observe, we follow the second direction to keep only one window sorter for each physical partition. And this single window sorter in each physical partition does not need to compare the window partition key. Perhaps I can rename these parameters to avoid confusion. How do you guys think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #25870: [SPARK-27936][K8S] Support python deps
dongjoon-hyun commented on pull request #25870: URL: https://github.com/apache/spark/pull/25870#issuecomment-729492170 Gentle ping, @skonto . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuzikun2003 commented on a change in pull request #29725: [SPARK-32096][SQL] Improve sorting performance of Spark SQL window function by removing window partition key from sort order
xuzikun2003 commented on a change in pull request #29725: URL: https://github.com/apache/spark/pull/29725#discussion_r525862140 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowWindowSorter.java ## @@ -0,0 +1,453 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution; + +import com.google.common.annotations.VisibleForTesting; + +import java.util.Comparator; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.Map.Entry; +import java.util.Queue; +import java.util.TreeMap; +import java.io.IOException; +import java.util.function.Supplier; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import scala.collection.Iterator; +import scala.math.Ordering; + +import org.apache.spark.memory.SparkOutOfMemoryError; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection; +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.sql.types.StructType; +import org.apache.spark.util.collection.unsafe.sort.PrefixComparator; +import org.apache.spark.util.collection.unsafe.sort.RecordComparator; + +public final class UnsafeExternalRowWindowSorter extends AbstractUnsafeExternalRowSorter { + + private static final Logger logger = LoggerFactory.getLogger(UnsafeExternalRowWindowSorter.class); + + private final StructType schema; + private final UnsafeProjection partitionSpecProjection; + private final Ordering orderingOfPartitionKey; + private final Ordering orderingInWindow; + private final Ordering orderingAcrossWindows; + private final PrefixComparator prefixComparatorInWindow; + private final UnsafeExternalRowSorter.PrefixComputer prefixComputerInWindow; + private final boolean canUseRadixSortInWindow; + private final long pageSizeBytes; + private static final int windowSorterMapMaxSize = 1; Review comment: @maropu , @opensky142857, here are the reasons for why we set the windowSorterMapMaxSize to be 1 and why we reduce the page size of each sorter. Each UnsafeExternalRowSorter is using a different memory consumer. Whenever you insert the first row into an UnsafeExternalRowSorter, the memory consumer of this sorter will allocate a whole page to the sorter. In our perf run of TPCDS100TB, the default page size is 64MB. If we insert only a few rows into a sorter corresponding to a window, then a lot of memory resources are wasted and it also brings significant overhead for non-necessary memory allocation. So that is why we do two things in this PR: 1. Keep the number of window sorters small 2. Decrease the page size of each window sorter. To address this problem, actually we have two directions to go. One direction is that we can let these window sorters share the same memory consumer. Thus we won't allocate many big pages to which very few rows are inserted. But this direction requires a lot of engineer effort the refactor the code of UnsafeExternalSorter. The second direction is that we only keep one window sorter for each physical partition. Here is why we choose the second direction. When we run TPCDS100TB, we are not seeing Spark engine is slow in sorting many windows in a physical partition. We are seeing Spark engine is slow in sorting a single window in a single physical partition (q67 is the case), and the executor is doing a lot of unnecessary comparisons on the window partition key. To address the slowness that we observe, we follow the second direction to keep only one window sorter for each physical partition. And this single window sorter in each physical partition does not need to compare the window partition key. Perhaps I can rename these parameters to avoid confusion. How do you guys think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #30407: [SPARK-32852][SQL][Doc] Revise the documentation of spark.sql.hive.metastore.jars
gengliangwang commented on a change in pull request #30407: URL: https://github.com/apache/spark/pull/30407#discussion_r525863735 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -96,17 +96,18 @@ private[spark] object HiveUtils extends Logging { .createWithDefault("builtin") val HIVE_METASTORE_JARS_PATH = buildStaticConf("spark.sql.hive.metastore.jars.path") -.doc(s"Comma separated URL of Hive jars, support both local and remote paths," + - s"Such as: " + - s" 1. file://path/to/jar/xxx.jar\n" + - s" 2. hdfs://nameservice/path/to/jar/xxx.jar\n" + - s" 3. /path/to/jar/ (path without URI scheme follow conf `fs.defaultFS`'s URI schema)\n" + - s" 4. [http/https/ftp]://path/to/jar/xxx.jar\n" + - s"Notice: `http/https/ftp` doesn't support wildcard, but other URLs support" + - s"nested path wildcard, Such as: " + - s" 1. file://path/to/jar/*, file://path/to/jar/*/*\n" + - s" 2. hdfs://nameservice/path/to/jar/*, hdfs://nameservice/path/to/jar/*/*\n" + - s"When ${HIVE_METASTORE_JARS.key} is set to `path`, we will use Hive jars configured by this") +.doc(s""" + | Comma-sperated paths of the jars that should be used to instantiate the HiveMetastoreClient. Review comment: Thanks, updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
dongjoon-hyun commented on pull request #30405: URL: https://github.com/apache/spark/pull/30405#issuecomment-729490935 Thank you, @viirya ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30407: [SPARK-32852][SQL][Doc] Revise the documentation of spark.sql.hive.metastore.jars
dongjoon-hyun commented on a change in pull request #30407: URL: https://github.com/apache/spark/pull/30407#discussion_r525862572 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -96,17 +96,18 @@ private[spark] object HiveUtils extends Logging { .createWithDefault("builtin") val HIVE_METASTORE_JARS_PATH = buildStaticConf("spark.sql.hive.metastore.jars.path") -.doc(s"Comma separated URL of Hive jars, support both local and remote paths," + - s"Such as: " + - s" 1. file://path/to/jar/xxx.jar\n" + - s" 2. hdfs://nameservice/path/to/jar/xxx.jar\n" + - s" 3. /path/to/jar/ (path without URI scheme follow conf `fs.defaultFS`'s URI schema)\n" + - s" 4. [http/https/ftp]://path/to/jar/xxx.jar\n" + - s"Notice: `http/https/ftp` doesn't support wildcard, but other URLs support" + - s"nested path wildcard, Such as: " + - s" 1. file://path/to/jar/*, file://path/to/jar/*/*\n" + - s" 2. hdfs://nameservice/path/to/jar/*, hdfs://nameservice/path/to/jar/*/*\n" + - s"When ${HIVE_METASTORE_JARS.key} is set to `path`, we will use Hive jars configured by this") +.doc(s""" + | Comma-sperated paths of the jars that should be used to instantiate the HiveMetastoreClient. + | This configuration is effective only when `{$HIVE_METASTORE_JARS.key}` is set as `path`. + | The paths can be any of the following format: + | 1. file://path/to/jar/foo.jar + | 2. hdfs://nameservice/path/to/jar/foo.jar + | 3. /path/to/jar/ (path without URI scheme follow conf `fs.defaultFS`'s URI schema) + | 4. [http/https/ftp]://path/to/jar/foo.jar + | All the above formats support wildcard except for the `http/https/ftp` URL. For example: Review comment: Maybe, the following is shorter? ``` - All the above formats support wildcard except for the `http/https/ftp` URL. For example: + 1, 2, and 3 support wildcard. For example: ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30373: [SPARK-33472][SQL] Adjust RemoveRedundantSorts rule order
SparkQA commented on pull request #30373: URL: https://github.com/apache/spark/pull/30373#issuecomment-729490026 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35859/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuzikun2003 commented on a change in pull request #29725: [SPARK-32096][SQL] Improve sorting performance of Spark SQL window function by removing window partition key from sort order
xuzikun2003 commented on a change in pull request #29725: URL: https://github.com/apache/spark/pull/29725#discussion_r525862140 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowWindowSorter.java ## @@ -0,0 +1,453 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution; + +import com.google.common.annotations.VisibleForTesting; + +import java.util.Comparator; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.Map.Entry; +import java.util.Queue; +import java.util.TreeMap; +import java.io.IOException; +import java.util.function.Supplier; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import scala.collection.Iterator; +import scala.math.Ordering; + +import org.apache.spark.memory.SparkOutOfMemoryError; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection; +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.sql.types.StructType; +import org.apache.spark.util.collection.unsafe.sort.PrefixComparator; +import org.apache.spark.util.collection.unsafe.sort.RecordComparator; + +public final class UnsafeExternalRowWindowSorter extends AbstractUnsafeExternalRowSorter { + + private static final Logger logger = LoggerFactory.getLogger(UnsafeExternalRowWindowSorter.class); + + private final StructType schema; + private final UnsafeProjection partitionSpecProjection; + private final Ordering orderingOfPartitionKey; + private final Ordering orderingInWindow; + private final Ordering orderingAcrossWindows; + private final PrefixComparator prefixComparatorInWindow; + private final UnsafeExternalRowSorter.PrefixComputer prefixComputerInWindow; + private final boolean canUseRadixSortInWindow; + private final long pageSizeBytes; + private static final int windowSorterMapMaxSize = 1; Review comment: @maropu , @opensky142857, here are the reasons for why we set the windowSorterMapMaxSize to be 1 and why we reduce the page size of each sorter. Each UnsafeExternalRowSorter is using a different memory consumer. Whenever you insert the first row into an UnsafeExternalRowSorter, the memory consumer of this sorter will allocate a whole page to the sorter. In our perf run of TPCDS100TB, the default page size is 64MB. If we insert only a few rows into a sorter corresponding to a window, then a lot of memory resources are wasted and it also brings significant overhead for non-necessary memory allocation. So that is why we do two things in this PR: 1. Keep the number of window sorters small 2. Decrease the page size of each window sorter. To address this problem, actually we have two directions to go. One direction is that we can let these window sorters share the same memory consumer. Thus we won't allocate many big pages to which very few rows are inserted. The second direction is that we only keep one window sorter for each physical partition. Here is why we choose the second direction. When we run TPCDS100TB, we are not seeing Spark engine is slow in sorting many windows in a physical partition. We are seeing Spark engine is slow in sorting a single window in a single physical partition (q67 is the case), and the executor is doing a lot of unnecessary comparisons on the window partition key. To address the slowness that we observe, we follow the second direction to keep only one window sorter for each physical partition. And this single window sorter in each physical partition does not need to compare the window partition key. Perhaps I can rename these parameters to avoid confusion. How do you guys think? ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ## @@ -42,6 +43,139 @@ case class SortExec( global: Boolean, child: SparkPlan, testSpillFrequency: Int = 0) + extends SortExecBase( +sortOrder, +global, +child, +testSpillFrequency) { + + def createSorter(): UnsafeExternalRowSorter = { Review comment: Sure, will add. ## File path:
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30407: [SPARK-32852][SQL][Doc] Revise the documentation of spark.sql.hive.metastore.jars
dongjoon-hyun commented on a change in pull request #30407: URL: https://github.com/apache/spark/pull/30407#discussion_r525861673 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -96,17 +96,18 @@ private[spark] object HiveUtils extends Logging { .createWithDefault("builtin") val HIVE_METASTORE_JARS_PATH = buildStaticConf("spark.sql.hive.metastore.jars.path") -.doc(s"Comma separated URL of Hive jars, support both local and remote paths," + - s"Such as: " + - s" 1. file://path/to/jar/xxx.jar\n" + - s" 2. hdfs://nameservice/path/to/jar/xxx.jar\n" + - s" 3. /path/to/jar/ (path without URI scheme follow conf `fs.defaultFS`'s URI schema)\n" + - s" 4. [http/https/ftp]://path/to/jar/xxx.jar\n" + - s"Notice: `http/https/ftp` doesn't support wildcard, but other URLs support" + - s"nested path wildcard, Such as: " + - s" 1. file://path/to/jar/*, file://path/to/jar/*/*\n" + - s" 2. hdfs://nameservice/path/to/jar/*, hdfs://nameservice/path/to/jar/*/*\n" + - s"When ${HIVE_METASTORE_JARS.key} is set to `path`, we will use Hive jars configured by this") +.doc(s""" + | Comma-sperated paths of the jars that should be used to instantiate the HiveMetastoreClient. Review comment: Could you revert from `sperated` (this) to `separated` (original)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv
SparkQA commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729489017 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35858/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang edited a comment on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
LuciferYang edited a comment on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-729487796 seems all case has been fixed except for `RDD.isEmpty` relevant case This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
LuciferYang commented on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-729487796 seems all case fixed except for `RDD.isEmpty` relevant case This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #30407: [SPARK-32852][SQL][Doc] Revise the documentation of spark.sql.hive.metastore.jars
gengliangwang commented on a change in pull request #30407: URL: https://github.com/apache/spark/pull/30407#discussion_r525860237 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -95,18 +95,19 @@ private[spark] object HiveUtils extends Logging { .stringConf .createWithDefault("builtin") - val HIVE_METASTORE_JARS_PATH = buildStaticConf("spark.sql.hive.metastore.jars.path") -.doc(s"Comma separated URL of Hive jars, support both local and remote paths," + - s"Such as: " + - s" 1. file://path/to/jar/xxx.jar\n" + - s" 2. hdfs://nameservice/path/to/jar/xxx.jar\n" + - s" 3. /path/to/jar/ (path without URI scheme follow conf `fs.defaultFS`'s URI schema)\n" + - s" 4. [http/https/ftp]://path/to/jar/xxx.jar\n" + - s"Notice: `http/https/ftp` doesn't support wildcard, but other URLs support" + - s"nested path wildcard, Such as: " + - s" 1. file://path/to/jar/*, file://path/to/jar/*/*\n" + - s" 2. hdfs://nameservice/path/to/jar/*, hdfs://nameservice/path/to/jar/*/*\n" + - s"When ${HIVE_METASTORE_JARS.key} is set to `path`, we will use Hive jars configured by this") +val HIVE_METASTORE_JARS_PATH = buildStaticConf("spark.sql.hive.metastore.jars.path") Review comment: Thanks, updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
dongjoon-hyun commented on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-729487327 Sorry, but what do you mean? > It seems that except for RDD.isEmpty has been fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
LuciferYang commented on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-729486540 @dongjoon-hyun It seems that except for `RDD.isEmpty` has been fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30407: [SPARK-32852][SQL][Doc] Revise the documentation of spark.sql.hive.metastore.jars
dongjoon-hyun commented on a change in pull request #30407: URL: https://github.com/apache/spark/pull/30407#discussion_r525859433 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -95,18 +95,19 @@ private[spark] object HiveUtils extends Logging { .stringConf .createWithDefault("builtin") - val HIVE_METASTORE_JARS_PATH = buildStaticConf("spark.sql.hive.metastore.jars.path") -.doc(s"Comma separated URL of Hive jars, support both local and remote paths," + - s"Such as: " + - s" 1. file://path/to/jar/xxx.jar\n" + - s" 2. hdfs://nameservice/path/to/jar/xxx.jar\n" + - s" 3. /path/to/jar/ (path without URI scheme follow conf `fs.defaultFS`'s URI schema)\n" + - s" 4. [http/https/ftp]://path/to/jar/xxx.jar\n" + - s"Notice: `http/https/ftp` doesn't support wildcard, but other URLs support" + - s"nested path wildcard, Such as: " + - s" 1. file://path/to/jar/*, file://path/to/jar/*/*\n" + - s" 2. hdfs://nameservice/path/to/jar/*, hdfs://nameservice/path/to/jar/*/*\n" + - s"When ${HIVE_METASTORE_JARS.key} is set to `path`, we will use Hive jars configured by this") +val HIVE_METASTORE_JARS_PATH = buildStaticConf("spark.sql.hive.metastore.jars.path") Review comment: indentation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30391: [SPARK-33464][INFRA] Add/remove (un)necessary cache and restructure GitHub Actions yaml
dongjoon-hyun commented on pull request #30391: URL: https://github.com/apache/spark/pull/30391#issuecomment-729485211 The position is up to the agreement between @HyukjinKwon and @mridulm . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting externa
AmplabJenkins removed a comment on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729483958 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffl
AmplabJenkins commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729483958 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv
SparkQA commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729483939 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35856/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
HeartSaVioR commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-729483537 Thanks @cchighman for great efforts during so far, and sorry to make you struggle with the review process. I'll take this over based on the current state of the PR and address my own comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #30407: [SPARK-32852][Doc] Revise the documentation of spark.sql.hive.metastore.jars
gengliangwang commented on pull request #30407: URL: https://github.com/apache/spark/pull/30407#issuecomment-729483340 cc @AngersZh @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #30407: [SPARK-32852][Doc] Revise the documentation of spark.sql.hive.metastore.jars
gengliangwang opened a new pull request #30407: URL: https://github.com/apache/spark/pull/30407 ### What changes were proposed in this pull request? This is a follow-up for https://github.com/apache/spark/pull/29881. It revises the documentation of the configuration `spark.sql.hive.metastore.jars`. ### Why are the changes needed? Fix grammatical errors in the doc. Also, make it more clear that the configuration is effective only when `spark.sql.hive.metastore.jars` is set as `path` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Just doc changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv
SparkQA commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729480460 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35857/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30355: [SPARK-32907][ML][PYTHON] Adaptively blockify instances - AFT,LiR,LoR
AmplabJenkins removed a comment on pull request #30355: URL: https://github.com/apache/spark/pull/30355#issuecomment-729475604 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35855/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30403: [SPARK-33448][SQL] Migrate CACHE/UNCACHE TABLE command to use UnresolvedTableOrView to resolve the identifier
AmplabJenkins removed a comment on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-729471388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131246/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30355: [SPARK-32907][ML][PYTHON] Adaptively blockify instances - AFT,LiR,LoR
SparkQA removed a comment on pull request #30355: URL: https://github.com/apache/spark/pull/30355#issuecomment-729434913 **[Test build #131252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131252/testReport)** for PR 30355 at commit [`48b2814`](https://github.com/apache/spark/commit/48b28146033b292f6b856d2443bf5b562854066a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
SparkQA removed a comment on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-729391035 **[Test build #131244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131244/testReport)** for PR 30234 at commit [`2082c3b`](https://github.com/apache/spark/commit/2082c3bd40a77c9083ec658045f9588a76c1ef60). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
AmplabJenkins removed a comment on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-729477538 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30403: [SPARK-33448][SQL] Migrate CACHE/UNCACHE TABLE command to use UnresolvedTableOrView to resolve the identifier
SparkQA removed a comment on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-72944 **[Test build #131246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131246/testReport)** for PR 30403 at commit [`a0687b3`](https://github.com/apache/spark/commit/a0687b3fc8ba5bd9ac19c3a0374cf855e8493f78). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30351: [SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports
AmplabJenkins removed a comment on pull request #30351: URL: https://github.com/apache/spark/pull/30351#issuecomment-729469598 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30260: [SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode
AmplabJenkins removed a comment on pull request #30260: URL: https://github.com/apache/spark/pull/30260#issuecomment-729466252 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30403: [SPARK-33448][SQL] Migrate CACHE/UNCACHE TABLE command to use UnresolvedTableOrView to resolve the identifier
AmplabJenkins removed a comment on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-729471375 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30355: [SPARK-32907][ML][PYTHON] Adaptively blockify instances - AFT,LiR,LoR
AmplabJenkins removed a comment on pull request #30355: URL: https://github.com/apache/spark/pull/30355#issuecomment-729473406 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
AmplabJenkins removed a comment on pull request #30405: URL: https://github.com/apache/spark/pull/30405#issuecomment-729460845 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30393: [SPARK-24554][PYTHON][SQL] Add MapType support for PySpark with Arrow
SparkQA commented on pull request #30393: URL: https://github.com/apache/spark/pull/30393#issuecomment-729477702 **[Test build #131260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131260/testReport)** for PR 30393 at commit [`b257470`](https://github.com/apache/spark/commit/b257470bccb363ca1ae10a96126c3d49d10ad8c6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
AmplabJenkins commented on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-729477538 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30312: [WIP][SPARK-32917][SHUFFLE][CORE][test-maven][test-hadoop2.7] Adds support for executors to push shuffle blocks after successful m
otterc commented on a change in pull request #30312: URL: https://github.com/apache/spark/pull/30312#discussion_r525853164 ## File path: core/src/main/scala/org/apache/spark/shuffle/PushShuffleSupport.scala ## @@ -0,0 +1,466 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import java.io.File +import java.net.ConnectException +import java.nio.ByteBuffer +import java.util.concurrent.ExecutorService + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue} + +import com.google.common.base.Throwables + +import org.apache.spark.{ShuffleDependency, SparkConf, SparkEnv} +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.launcher.SparkLauncher +import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, ManagedBuffer, NioManagedBuffer} +import org.apache.spark.network.netty.SparkTransportConf +import org.apache.spark.network.shuffle.BlockFetchingListener +import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler +import org.apache.spark.network.util.TransportConf +import org.apache.spark.shuffle.PushShuffleSupport._ +import org.apache.spark.storage.{BlockId, BlockManagerId, ShufflePushBlockId} +import org.apache.spark.util.{ThreadUtils, Utils} + +/** + * Used for pushing shuffle blocks to remote shuffle services when push shuffle is enabled. + * When push shuffle is enabled, it is created after the shuffle writer finishes writing the shuffle + * file and initiates the block push process. + * + * @param dataFile mapper generated shuffle data file + * @param partitionLengths array of shuffle block size so we can tell shuffle block + * boundaries within the shuffle file + * @param dep shuffle dependency to get shuffle ID and the location of remote shuffle + * services to push local shuffle blocks + * @param partitionId map index of the shuffle map task + * @param conf spark configuration + */ +private[spark] class PushShuffleSupport( Review comment: I have renamed `PushShuffleSupport` to `PushShuffleComponent`. Will wait for more feedback to make anymore changes to this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
SparkQA commented on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-729476281 **[Test build #131244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131244/testReport)** for PR 30234 at commit [`2082c3b`](https://github.com/apache/spark/commit/2082c3bd40a77c9083ec658045f9588a76c1ef60). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30355: [SPARK-32907][ML][PYTHON] Adaptively blockify instances - AFT,LiR,LoR
AmplabJenkins commented on pull request #30355: URL: https://github.com/apache/spark/pull/30355#issuecomment-729475594 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30355: [SPARK-32907][ML][PYTHON] Adaptively blockify instances - AFT,LiR,LoR
SparkQA commented on pull request #30355: URL: https://github.com/apache/spark/pull/30355#issuecomment-729475571 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35855/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
SparkQA commented on pull request #30405: URL: https://github.com/apache/spark/pull/30405#issuecomment-729473677 **[Test build #131259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131259/testReport)** for PR 30405 at commit [`8ca8624`](https://github.com/apache/spark/commit/8ca8624d43c530cda5ebb7a1a26c52174f908496). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30355: [SPARK-32907][ML][PYTHON] Adaptively blockify instances - AFT,LiR,LoR
AmplabJenkins commented on pull request #30355: URL: https://github.com/apache/spark/pull/30355#issuecomment-729473406 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30355: [SPARK-32907][ML][PYTHON] Adaptively blockify instances - AFT,LiR,LoR
SparkQA commented on pull request #30355: URL: https://github.com/apache/spark/pull/30355#issuecomment-729472700 **[Test build #131252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131252/testReport)** for PR 30355 at commit [`48b2814`](https://github.com/apache/spark/commit/48b28146033b292f6b856d2443bf5b562854066a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #30403: [SPARK-33448][SQL] Migrate CACHE/UNCACHE TABLE command to use UnresolvedTableOrView to resolve the identifier
imback82 commented on a change in pull request #30403: URL: https://github.com/apache/spark/pull/30403#discussion_r525849567 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -113,7 +113,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto e = intercept[AnalysisException] { sql("UNCACHE TABLE nonexistentTable") }.getMessage -assert(e.contains(s"$expectedErrorMsg default.nonexistentTable")) +assert(e.contains(s"$expectedErrorMsg nonexistentTable")) Review comment: This is now consistent with Line 108 above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on a change in pull request #30393: [SPARK-24554][PYTHON][SQL] Add MapType support for PySpark with Arrow
BryanCutler commented on a change in pull request #30393: URL: https://github.com/apache/spark/pull/30393#discussion_r525849381 ## File path: python/docs/source/user_guide/arrow_pandas.rst ## @@ -341,7 +341,7 @@ Supported SQL Types .. currentmodule:: pyspark.sql.types -Currently, all Spark SQL data types are supported by Arrow-based conversion except :class:`MapType`, +Currently, all Spark SQL data types are supported by Arrow-based conversion except Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30406: [SPARK-33473][SQL] Extend interpreted subexpression elimination to other interpreted projections
SparkQA commented on pull request #30406: URL: https://github.com/apache/spark/pull/30406#issuecomment-729471571 **[Test build #131258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131258/testReport)** for PR 30406 at commit [`ee4b070`](https://github.com/apache/spark/commit/ee4b070e64c24b174a37d4f730cc43867972e170). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30403: [SPARK-33448][SQL] Migrate CACHE/UNCACHE TABLE command to use UnresolvedTableOrView to resolve the identifier
AmplabJenkins commented on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-729471375 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30403: [SPARK-33448][SQL] Migrate CACHE/UNCACHE TABLE command to use UnresolvedTableOrView to resolve the identifier
SparkQA commented on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-729470662 **[Test build #131246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131246/testReport)** for PR 30403 at commit [`a0687b3`](https://github.com/apache/spark/commit/a0687b3fc8ba5bd9ac19c3a0374cf855e8493f78). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30351: [SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports
AmplabJenkins commented on pull request #30351: URL: https://github.com/apache/spark/pull/30351#issuecomment-729469598 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30351: [SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports
SparkQA commented on pull request #30351: URL: https://github.com/apache/spark/pull/30351#issuecomment-729469567 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35854/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv
SparkQA commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729469271 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35856/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on a change in pull request #30393: [SPARK-24554][PYTHON][SQL] Add MapType support for PySpark with Arrow
BryanCutler commented on a change in pull request #30393: URL: https://github.com/apache/spark/pull/30393#discussion_r525847003 ## File path: python/docs/source/user_guide/arrow_pandas.rst ## @@ -341,7 +341,7 @@ Supported SQL Types .. currentmodule:: pyspark.sql.types -Currently, all Spark SQL data types are supported by Arrow-based conversion except :class:`MapType`, +Currently, all Spark SQL data types are supported by Arrow-based conversion except Review comment: I should probably mention MapType only for pyarrow 2.0.0.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on a change in pull request #30393: [SPARK-24554][PYTHON][SQL] Add MapType support for PySpark with Arrow
BryanCutler commented on a change in pull request #30393: URL: https://github.com/apache/spark/pull/30393#discussion_r525846522 ## File path: python/pyspark/sql/pandas/types.py ## @@ -306,3 +322,23 @@ def _check_series_convert_timestamps_tz_local(s, timezone): `pandas.Series` where if it is a timestamp, has been converted to tz-naive """ return _check_series_convert_timestamps_localize(s, timezone, None) + + +def _convert_map_items_to_dict(s): Review comment: Note: these conversion functions are because pyarrow expects map items as a list of (key, value) pairs, and has this format when converting to Pandas also. The reason is that the arrow spec could allow for duplicate key values in a row, and doesn't say how these should be handled exactly. So by having these conversions, we match the non-arrow behavior for maps, with a dictionary as input/output. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30260: [SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode
AmplabJenkins commented on pull request #30260: URL: https://github.com/apache/spark/pull/30260#issuecomment-729466252 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30260: [SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode
SparkQA commented on pull request #30260: URL: https://github.com/apache/spark/pull/30260#issuecomment-729466242 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35853/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30406: [SPARK-33473][SQL] Extend interpreted subexpression elimination to other interpreted projections
maropu commented on pull request #30406: URL: https://github.com/apache/spark/pull/30406#issuecomment-729464155 Looks fine if the tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30355: [SPARK-32907][ML][PYTHON] Adaptively blockify instances - AFT,LiR,LoR
SparkQA commented on pull request #30355: URL: https://github.com/apache/spark/pull/30355#issuecomment-729462035 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35855/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29725: [SPARK-32096][SQL] Improve sorting performance of Spark SQL window function by removing window partition key from sort order
maropu commented on a change in pull request #29725: URL: https://github.com/apache/spark/pull/29725#discussion_r525841241 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ## @@ -42,6 +43,139 @@ case class SortExec( global: Boolean, child: SparkPlan, testSpillFrequency: Int = 0) + extends SortExecBase( +sortOrder, +global, +child, +testSpillFrequency) { + + def createSorter(): UnsafeExternalRowSorter = { +rowSorter = UnsafeExternalRowSorter.create( + schema, ordering, prefixComparator, prefixComputer, pageSize, canUseRadixSort) + +if (testSpillFrequency > 0) { + rowSorter.setTestSpillFrequency(testSpillFrequency) +} +rowSorter.asInstanceOf[UnsafeExternalRowSorter] + } + + override protected def doProduce(ctx: CodegenContext): String = { +doProduce(ctx, classOf[UnsafeExternalRowSorter].getName) + } +} + +/** + * Performs (external) sorting for multiple windows. + * + * @param partitionSpec a sequence of expressions that defines a partition key + * @param sortOrderInWindow a sequence of sort orders for sorting rows inside a window + * @param sortOrderAcrossWindows a sequence of sort orders for sorting rows across + * different windows on a Spark physical partition. + * This sequence of sort orders is obtained from a partition + * key plus a sequence of sort orders inside a window + * @param global when true performs a global sort of all partitions by shuffling the data first + * if necessary. + * @param testSpillFrequency Method for configuring periodic spilling in unit tests. If set, will + * spill every `frequency` records. + */ +case class WindowSortExec( +partitionSpec: Seq[Expression], +sortOrderInWindow: Seq[SortOrder], +sortOrderAcrossWindows: Seq[SortOrder], +global: Boolean, +child: SparkPlan, +testSpillFrequency: Int = 0) + extends SortExecBase( +sortOrderAcrossWindows, +global, +child, +testSpillFrequency) { + Review comment: Could you add `assert(partitionSpec.nonEmpty && sortOrderInWindow.nonEmpty, "XXX")` here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30312: [WIP][SPARK-32917][SHUFFLE][CORE][test-maven][test-hadoop2.7] Adds support for executors to push shuffle blocks after successful m
otterc commented on a change in pull request #30312: URL: https://github.com/apache/spark/pull/30312#discussion_r525838001 ## File path: core/src/main/scala/org/apache/spark/shuffle/PushShuffleSupport.scala ## @@ -0,0 +1,466 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import java.io.File +import java.net.ConnectException +import java.nio.ByteBuffer +import java.util.concurrent.ExecutorService + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue} + +import com.google.common.base.Throwables + +import org.apache.spark.{ShuffleDependency, SparkConf, SparkEnv} +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.launcher.SparkLauncher +import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, ManagedBuffer, NioManagedBuffer} +import org.apache.spark.network.netty.SparkTransportConf +import org.apache.spark.network.shuffle.BlockFetchingListener +import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler +import org.apache.spark.network.util.TransportConf +import org.apache.spark.shuffle.PushShuffleSupport._ +import org.apache.spark.storage.{BlockId, BlockManagerId, ShufflePushBlockId} +import org.apache.spark.util.{ThreadUtils, Utils} + +/** + * Used for pushing shuffle blocks to remote shuffle services when push shuffle is enabled. + * When push shuffle is enabled, it is created after the shuffle writer finishes writing the shuffle + * file and initiates the block push process. + * + * @param dataFile mapper generated shuffle data file + * @param partitionLengths array of shuffle block size so we can tell shuffle block + * boundaries within the shuffle file + * @param dep shuffle dependency to get shuffle ID and the location of remote shuffle + * services to push local shuffle blocks + * @param partitionId map index of the shuffle map task + * @param conf spark configuration + */ +private[spark] class PushShuffleSupport( Review comment: > XXXSupport is usually a trait or abstract class. I can rename it to be `PushShuffleComponent` > Would it be better to make PushShuffleSupport an abstract class which ShuffleWriter extends? The getPartitionLengths method can be the abstract method in this abstract class instead of inside ShuffleWriter I can add another interface which let's say looks like this and ShuffleWriter can extend it ``` interface PushEnabledShuffleWriter { getPartitionLengths() initiateBlockPush(...) } ``` Not sure if the interface is really needed since ShuffleWriter is the only base writer to which we are adding this ability. Maybe we can just add back the `initiateBlockPush` method to ShuffleWriter. > All the other class fields in this class can be converted into method arguments for initiateBlockPush. This way, we would still be able to initiate block push via the ShuffleWriter objects instead of relying on instantiating an object of PushShuffleSupport. With the ` initiateBlockPush` in `ShuffleWriter`, block push is still triggered via the shuffle writer. It will however delegate it to `PushShuffleComponent`. @Victsm @Ngone51 @mridulm Please let me know what you think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
dongjoon-hyun commented on pull request #30405: URL: https://github.com/apache/spark/pull/30405#issuecomment-729461077 All comments are addressed. (@HyukjinKwon , @viirya , @mridulm ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
AmplabJenkins commented on pull request #30405: URL: https://github.com/apache/spark/pull/30405#issuecomment-729460845 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
SparkQA commented on pull request #30405: URL: https://github.com/apache/spark/pull/30405#issuecomment-729460822 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35852/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29725: [SPARK-32096][SQL] Improve sorting performance of Spark SQL window function by removing window partition key from sort order
maropu commented on a change in pull request #29725: URL: https://github.com/apache/spark/pull/29725#discussion_r525840579 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ## @@ -124,7 +125,18 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { if (SortOrder.orderingSatisfies(child.outputOrdering, requiredOrdering)) { child } else { -SortExec(requiredOrdering, global = false, child = child) +operator match { + case WindowExec(_, partitionSpec, orderSpec, _) +if (!partitionSpec.isEmpty && !orderSpec.isEmpty) => Review comment: nit: `isEmpty` -> `nonEmpty` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external
dongjoon-hyun edited a comment on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729460106 +1, LGTM. (only editorial comments.) Thank you all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffl
dongjoon-hyun commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-729460106 +1, LGTM. (only editorial comments.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29725: [SPARK-32096][SQL] Improve sorting performance of Spark SQL window function by removing window partition key from sort order
maropu commented on a change in pull request #29725: URL: https://github.com/apache/spark/pull/29725#discussion_r525834434 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowWindowSorter.java ## @@ -0,0 +1,453 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution; + +import com.google.common.annotations.VisibleForTesting; + +import java.util.Comparator; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.Map.Entry; +import java.util.Queue; +import java.util.TreeMap; +import java.io.IOException; +import java.util.function.Supplier; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import scala.collection.Iterator; +import scala.math.Ordering; + +import org.apache.spark.memory.SparkOutOfMemoryError; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.expressions.UnsafeProjection; +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.sql.types.StructType; +import org.apache.spark.util.collection.unsafe.sort.PrefixComparator; +import org.apache.spark.util.collection.unsafe.sort.RecordComparator; + +public final class UnsafeExternalRowWindowSorter extends AbstractUnsafeExternalRowSorter { + + private static final Logger logger = LoggerFactory.getLogger(UnsafeExternalRowWindowSorter.class); + + private final StructType schema; + private final UnsafeProjection partitionSpecProjection; + private final Ordering orderingOfPartitionKey; + private final Ordering orderingInWindow; + private final Ordering orderingAcrossWindows; + private final PrefixComparator prefixComparatorInWindow; + private final UnsafeExternalRowSorter.PrefixComputer prefixComputerInWindow; + private final boolean canUseRadixSortInWindow; + private final long pageSizeBytes; + private static final int windowSorterMapMaxSize = 1; + private static final int totalNumSorters = windowSorterMapMaxSize + 1; + private final HashMap windowSorterMap; Review comment: nit: `HashMap` ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ## @@ -42,6 +43,139 @@ case class SortExec( global: Boolean, child: SparkPlan, testSpillFrequency: Int = 0) + extends SortExecBase( +sortOrder, +global, +child, +testSpillFrequency) { + + def createSorter(): UnsafeExternalRowSorter = { +rowSorter = UnsafeExternalRowSorter.create( + schema, ordering, prefixComparator, prefixComputer, pageSize, canUseRadixSort) + +if (testSpillFrequency > 0) { + rowSorter.setTestSpillFrequency(testSpillFrequency) +} +rowSorter.asInstanceOf[UnsafeExternalRowSorter] + } + + override protected def doProduce(ctx: CodegenContext): String = { +doProduce(ctx, classOf[UnsafeExternalRowSorter].getName) + } +} + +/** + * Performs (external) sorting for multiple windows. Review comment: Could you leave some comments about what's a difference from `SortExec`? ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowWindowSorter.java ## @@ -0,0 +1,453 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution; + +import com.google.common.annotations.VisibleForTesting; + +import java.util.Comparator; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.Map.Entry;
[GitHub] [spark] SparkQA commented on pull request #30351: [SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports
SparkQA commented on pull request #30351: URL: https://github.com/apache/spark/pull/30351#issuecomment-729458991 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35854/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30396: [MINOR][SQL][DOCS] Update schema_of_csv and schema_of_json doc
AmplabJenkins removed a comment on pull request #30396: URL: https://github.com/apache/spark/pull/30396#issuecomment-729457164 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30406: [SPARK-33473][SQL] Extend interpreted subexpression elimination to other interpreted projections
SparkQA commented on pull request #30406: URL: https://github.com/apache/spark/pull/30406#issuecomment-729457313 **[Test build #131257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131257/testReport)** for PR 30406 at commit [`37e3dca`](https://github.com/apache/spark/commit/37e3dca4d1240e37729efd764eb4b676feac0e65). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30396: [MINOR][SQL][DOCS] Update schema_of_csv and schema_of_json doc
AmplabJenkins commented on pull request #30396: URL: https://github.com/apache/spark/pull/30396#issuecomment-729457164 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30312: [WIP][SPARK-32917][SHUFFLE][CORE][test-maven][test-hadoop2.7] Adds support for executors to push shuffle blocks after successful m
otterc commented on a change in pull request #30312: URL: https://github.com/apache/spark/pull/30312#discussion_r525838001 ## File path: core/src/main/scala/org/apache/spark/shuffle/PushShuffleSupport.scala ## @@ -0,0 +1,466 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import java.io.File +import java.net.ConnectException +import java.nio.ByteBuffer +import java.util.concurrent.ExecutorService + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Queue} + +import com.google.common.base.Throwables + +import org.apache.spark.{ShuffleDependency, SparkConf, SparkEnv} +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.launcher.SparkLauncher +import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, ManagedBuffer, NioManagedBuffer} +import org.apache.spark.network.netty.SparkTransportConf +import org.apache.spark.network.shuffle.BlockFetchingListener +import org.apache.spark.network.shuffle.ErrorHandler.BlockPushErrorHandler +import org.apache.spark.network.util.TransportConf +import org.apache.spark.shuffle.PushShuffleSupport._ +import org.apache.spark.storage.{BlockId, BlockManagerId, ShufflePushBlockId} +import org.apache.spark.util.{ThreadUtils, Utils} + +/** + * Used for pushing shuffle blocks to remote shuffle services when push shuffle is enabled. + * When push shuffle is enabled, it is created after the shuffle writer finishes writing the shuffle + * file and initiates the block push process. + * + * @param dataFile mapper generated shuffle data file + * @param partitionLengths array of shuffle block size so we can tell shuffle block + * boundaries within the shuffle file + * @param dep shuffle dependency to get shuffle ID and the location of remote shuffle + * services to push local shuffle blocks + * @param partitionId map index of the shuffle map task + * @param conf spark configuration + */ +private[spark] class PushShuffleSupport( Review comment: > XXXSupport is usually a trait or abstract class. I can rename it to be `PushShuffleComponent` > Would it be better to make PushShuffleSupport an abstract class which ShuffleWriter extends? The getPartitionLengths method can be the abstract method in this abstract class instead of inside ShuffleWriter I can add another interface which let's say looks like this and ShuffleWriter can extend it ``` interface PushEnabledShuffleWriter { getPartitionLengths() initiateBlockPush(...) } ``` Not sure if the interface is really needed since ShuffleWriter is the only base writer to which we are adding this ability. Maybe we can just add back the `initiateBlockPush` method to ShuffleWriter. > All the other class fields in this class can be converted into method arguments for initiateBlockPush. This way, we would still be able to initiate block push via the ShuffleWriter objects instead of relying on instantiating an object of PushShuffleSupport. With the above interface, ` initiateBlockPush` of `ShuffleWriter` can delegate the push to `PushShuffleComponent`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30396: [MINOR][SQL][DOCS] Update schema_of_csv and schema_of_json doc
SparkQA removed a comment on pull request #30396: URL: https://github.com/apache/spark/pull/30396#issuecomment-729321498 **[Test build #131239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131239/testReport)** for PR 30396 at commit [`1224ba9`](https://github.com/apache/spark/commit/1224ba9f6d27d3b3502e7ee64ff7ec53cb9bd7da). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30396: [MINOR][SQL][DOCS] Update schema_of_csv and schema_of_json doc
SparkQA commented on pull request #30396: URL: https://github.com/apache/spark/pull/30396#issuecomment-729456105 **[Test build #131239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131239/testReport)** for PR 30396 at commit [`1224ba9`](https://github.com/apache/spark/commit/1224ba9f6d27d3b3502e7ee64ff7ec53cb9bd7da). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
dongjoon-hyun commented on a change in pull request #30405: URL: https://github.com/apache/spark/pull/30405#discussion_r525837460 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -110,7 +110,9 @@ private[spark] class Executor( .build() Executors.newCachedThreadPool(threadFactory).asInstanceOf[ThreadPoolExecutor] } - private val executorSource = new ExecutorSource(threadPool, executorId) + private val schemes = conf.get(EXECUTOR_METRICS_FILESYSTEM_SCHEMES) +.toLowerCase(Locale.ROOT).split(",").map(_.trim) Review comment: Sure, thanks, @mridulm ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
mridulm commented on a change in pull request #30405: URL: https://github.com/apache/spark/pull/30405#discussion_r525836701 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -110,7 +110,9 @@ private[spark] class Executor( .build() Executors.newCachedThreadPool(threadFactory).asInstanceOf[ThreadPoolExecutor] } - private val executorSource = new ExecutorSource(threadPool, executorId) + private val schemes = conf.get(EXECUTOR_METRICS_FILESYSTEM_SCHEMES) +.toLowerCase(Locale.ROOT).split(",").map(_.trim) Review comment: nit: filter empty after `map` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30260: [SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode
SparkQA commented on pull request #30260: URL: https://github.com/apache/spark/pull/30260#issuecomment-729454825 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35853/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
c21 commented on a change in pull request #29000: URL: https://github.com/apache/spark/pull/29000#discussion_r525836484 ## File path: core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ## @@ -169,4 +169,8 @@ object FileCommitProtocol extends Logging { ctor.newInstance(jobId, outputPath) } } + + def getStagingDir(path: String, jobId: String): Path = { +new Path(path, ".spark-staging-" + jobId) + } Review comment: @cloud-fan - wondering do you still think FileCommitProtocol V2 (https://issues.apache.org/jira/browse/SPARK-33298) is a pre-requisite for this PR? or this one is good to go separately, and we only need to worry about allowing adding prefix to file name per https://github.com/apache/spark/pull/30003 ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external s
Victsm commented on a change in pull request #30164: URL: https://github.com/apache/spark/pull/30164#discussion_r525835969 ## File path: core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala ## @@ -92,4 +93,16 @@ private[spark] trait SchedulerBackend { */ def maxNumConcurrentTasks(rp: ResourceProfile): Int + /** + * Get the list of host locations for push based shuffle + * + * Currently push based shuffle is disabled for both stage retry and stage reuse cases Review comment: Both are true. `getShufflePushMergerLocations` will be invoked only once per `ShuffleDependency`. Thus retried stages will get the same merger locations. In #30062, the way we implemented the block push handling logic would ignore blocks received after shuffle finalization. https://github.com/apache/spark/blob/dd32f45d2058d00293330c01d3d9f53ecdbc036c/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java#L132 So, blocks pushed from the retry stage will be ignored, and this is what's stated in this comment about `push based shuffle is disabled for both stage retry and stage reuse cases`. Ignoring blocks pushed from the retry stage is reasonable, since the block data from these retried tasks most likely have already been merged. Making sure the retried stage use the same merger location is critical to ensure we don't run into data duplication issues. The only exception is for indeterministic stage retry, which we have created SPARK-32923 for it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30405: [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes
dongjoon-hyun commented on pull request #30405: URL: https://github.com/apache/spark/pull/30405#issuecomment-729453805 Thank you, @HyukjinKwon and @viirya . I'll mention this new conf in the `monitoring.md` according to your advice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting ext
dongjoon-hyun commented on a change in pull request #30164: URL: https://github.com/apache/spark/pull/30164#discussion_r525835108 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ## @@ -161,6 +173,35 @@ private[spark] abstract class YarnSchedulerBackend( totalRegisteredExecutors.get() >= totalExpectedExecutors * minRegisteredRatio } + override def getShufflePushMergerLocations( + numPartitions: Int, + resourceProfileId: Int): Seq[BlockManagerId] = { +// Currently this is naive way of calculating numMergersDesired for a stage. In future, +// we can use better heuristics to calculate numMergersDesired for a stage. +val maxExecutors = if (Utils.isDynamicAllocationEnabled(sc.getConf)) { + maxNumExecutors +} else { + numExecutors +} +val tasksPerExecutor = sc.resourceProfileManager + .resourceProfileFromId(resourceProfileId).maxTasksPerExecutor(sc.conf) +val numMergersDesired = math.min( + math.max(1, math.ceil(numPartitions / tasksPerExecutor).toInt), maxExecutors) +val minMergersNeeded = math.max(minMergersStaticThreshold, + math.floor(numMergersDesired * minMergersThresholdRatio).toInt) + +// Request for numMergersDesired shuffle mergers to BlockManagerMasterEndpoint +// and if its less than minMergersNeeded, we disable push based shuffle. +val mergerLocations = blockManagerMaster + .getShufflePushMergerLocations(numMergersDesired, scheduler.excludedNodes()) +if (mergerLocations.size < numMergersDesired && mergerLocations.size < minMergersNeeded) { + Seq.empty[BlockManagerId] +} else { + logDebug(s"Num merger locations available ${mergerLocations.length}") Review comment: Also, for the good logging, it would be great if we can print `numMergersDesired` together. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on pull request #30393: [SPARK-24554][PYTHON][SQL] Add MapType support for PySpark with Arrow
BryanCutler commented on pull request #30393: URL: https://github.com/apache/spark/pull/30393#issuecomment-729452568 > BTW I believe we should also update the docs :-) Thanks for reminding me! I'll do that now.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting ext
dongjoon-hyun commented on a change in pull request #30164: URL: https://github.com/apache/spark/pull/30164#discussion_r525834706 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ## @@ -161,6 +173,35 @@ private[spark] abstract class YarnSchedulerBackend( totalRegisteredExecutors.get() >= totalExpectedExecutors * minRegisteredRatio } + override def getShufflePushMergerLocations( + numPartitions: Int, + resourceProfileId: Int): Seq[BlockManagerId] = { +// Currently this is naive way of calculating numMergersDesired for a stage. In future, +// we can use better heuristics to calculate numMergersDesired for a stage. +val maxExecutors = if (Utils.isDynamicAllocationEnabled(sc.getConf)) { + maxNumExecutors +} else { + numExecutors +} +val tasksPerExecutor = sc.resourceProfileManager + .resourceProfileFromId(resourceProfileId).maxTasksPerExecutor(sc.conf) +val numMergersDesired = math.min( + math.max(1, math.ceil(numPartitions / tasksPerExecutor).toInt), maxExecutors) +val minMergersNeeded = math.max(minMergersStaticThreshold, + math.floor(numMergersDesired * minMergersThresholdRatio).toInt) + +// Request for numMergersDesired shuffle mergers to BlockManagerMasterEndpoint +// and if its less than minMergersNeeded, we disable push based shuffle. +val mergerLocations = blockManagerMaster + .getShufflePushMergerLocations(numMergersDesired, scheduler.excludedNodes()) +if (mergerLocations.size < numMergersDesired && mergerLocations.size < minMergersNeeded) { + Seq.empty[BlockManagerId] +} else { + logDebug(s"Num merger locations available ${mergerLocations.length}") Review comment: If you don't mine, shall we revise like the following? ```scala - logDebug(s"Num merger locations available ${mergerLocations.length}") + logDebug(s"The number of available locations are ${mergerLocations.length}") ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting ext
dongjoon-hyun commented on a change in pull request #30164: URL: https://github.com/apache/spark/pull/30164#discussion_r525834706 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ## @@ -161,6 +173,35 @@ private[spark] abstract class YarnSchedulerBackend( totalRegisteredExecutors.get() >= totalExpectedExecutors * minRegisteredRatio } + override def getShufflePushMergerLocations( + numPartitions: Int, + resourceProfileId: Int): Seq[BlockManagerId] = { +// Currently this is naive way of calculating numMergersDesired for a stage. In future, +// we can use better heuristics to calculate numMergersDesired for a stage. +val maxExecutors = if (Utils.isDynamicAllocationEnabled(sc.getConf)) { + maxNumExecutors +} else { + numExecutors +} +val tasksPerExecutor = sc.resourceProfileManager + .resourceProfileFromId(resourceProfileId).maxTasksPerExecutor(sc.conf) +val numMergersDesired = math.min( + math.max(1, math.ceil(numPartitions / tasksPerExecutor).toInt), maxExecutors) +val minMergersNeeded = math.max(minMergersStaticThreshold, + math.floor(numMergersDesired * minMergersThresholdRatio).toInt) + +// Request for numMergersDesired shuffle mergers to BlockManagerMasterEndpoint +// and if its less than minMergersNeeded, we disable push based shuffle. +val mergerLocations = blockManagerMaster + .getShufflePushMergerLocations(numMergersDesired, scheduler.excludedNodes()) +if (mergerLocations.size < numMergersDesired && mergerLocations.size < minMergersNeeded) { + Seq.empty[BlockManagerId] +} else { + logDebug(s"Num merger locations available ${mergerLocations.length}") Review comment: If you don't mine, shall we revise like the following? ```scala - logDebug(s"Num merger locations available ${mergerLocations.length}") + logDebug(s"The number of available locations are ${mergerLocations.length}") ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30404: [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
AmplabJenkins removed a comment on pull request #30404: URL: https://github.com/apache/spark/pull/30404#issuecomment-729450454 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35851/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30400: [SPARK-33469][SQL] Add current_timezone function
AmplabJenkins removed a comment on pull request #30400: URL: https://github.com/apache/spark/pull/30400#issuecomment-729441891 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35850/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org