[jira] [Updated] (SPARK-39950) It's necessary to materialize BroadcastQueryStage first, because the BroadcastQueryStage does not timeout in AQE.
[ https://issues.apache.org/jira/browse/SPARK-39950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-39950: - Summary: It's necessary to materialize BroadcastQueryStage first, because the BroadcastQueryStage does not timeout in AQE. (was: It's necessary to materialize BroadcastQueryStage first, because the BroadcastQueryStage would NOT timeout in AQE. ) > It's necessary to materialize BroadcastQueryStage first, because the > BroadcastQueryStage does not timeout in AQE. > --- > > Key: SPARK-39950 > URL: https://issues.apache.org/jira/browse/SPARK-39950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.2.2 >Reporter: weixiuli >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39950) It's necessary to materialize BroadcastQueryStage first, because the BroadcastQueryStage would NOT timeout in AQE.
weixiuli created SPARK-39950: Summary: It's necessary to materialize BroadcastQueryStage first, because the BroadcastQueryStage would NOT timeout in AQE. Key: SPARK-39950 URL: https://issues.apache.org/jira/browse/SPARK-39950 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.2, 3.3.0, 3.2.1 Reporter: weixiuli -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39287) TaskSchedulerImpl should quickly ignore task finished event if its task was finished state
weixiuli created SPARK-39287: Summary: TaskSchedulerImpl should quickly ignore task finished event if its task was finished state Key: SPARK-39287 URL: https://issues.apache.org/jira/browse/SPARK-39287 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0 Reporter: weixiuli -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38856) Fix a rejectedExecutionException error when push-based shuffle is enabled
[ https://issues.apache.org/jira/browse/SPARK-38856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523981#comment-17523981 ] weixiuli commented on SPARK-38856: -- OK, done. [~srowen] > Fix a rejectedExecutionException error when push-based shuffle is enabled > - > > Key: SPARK-38856 > URL: https://issues.apache.org/jira/browse/SPARK-38856 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.2.1 >Reporter: weixiuli >Assignee: weixiuli >Priority: Major > > When enabled push-based shuffle in our production, there will be a > rejectedExecutionException error, this is because that the shuffle pusher > pool has been shutdowned before using it. > This is the rejectedExecutionException error : > {{FetchFailed(BlockManagerId(26,x.hadoop.jd.local, 7337, None), > shuffleId=0, mapIndex=6424, mapId=4177, reduceId=1031, message= > org.apache.spark.shuffle.FetchFailedException > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1181) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:919) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:81) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) > at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.agg_doAggregateWithKeys_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:815) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:133) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.concurrent.RejectedExecutionException: Task > org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2$$Lambda$1045/583658475@3492bd6f > rejected from java.util.concurrent.ThreadPoolExecutor@2e63bad5[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 243134] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) > at > org.apache.spark.shuffle.ShuffleBlockPusher.submitTask(ShuffleBlockPusher.scala:147) > at > org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.handleResult(ShuffleBlockPusher.scala:235) > at > org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.onBlockPushSuccess(ShuffleBlockPusher.scala:245) > at > org.apache.spark.network.shuffle.BlockPushingListener.onBlockTransferSuccess(BlockPushingListener.java:42) > at > org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.onBlockTransferSuccess(ShuffleBlockPusher.scala:224) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferSuccess(RetryingBlockTransferor.java:258) > at > org.apache.spark.network.shuffle.Retrying
[jira] [Updated] (SPARK-38856) Fix a rejectedExecutionException error when push-based shuffle is enabled
[ https://issues.apache.org/jira/browse/SPARK-38856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-38856: - Description: When enabled push-based shuffle in our production, there will be a rejectedExecutionException error, this is because that the shuffle pusher pool has been shutdowned before using it. This is the rejectedExecutionException error : {{FetchFailed(BlockManagerId(26,x.hadoop.jd.local, 7337, None), shuffleId=0, mapIndex=6424, mapId=4177, reduceId=1031, message= org.apache.spark.shuffle.FetchFailedException at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1181) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:919) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:81) at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:815) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2$$Lambda$1045/583658475@3492bd6f rejected from java.util.concurrent.ThreadPoolExecutor@2e63bad5[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 243134] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) at org.apache.spark.shuffle.ShuffleBlockPusher.submitTask(ShuffleBlockPusher.scala:147) at org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.handleResult(ShuffleBlockPusher.scala:235) at org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.onBlockPushSuccess(ShuffleBlockPusher.scala:245) at org.apache.spark.network.shuffle.BlockPushingListener.onBlockTransferSuccess(BlockPushingListener.java:42) at org.apache.spark.shuffle.ShuffleBlockPusher$$anon$2.onBlockTransferSuccess(ShuffleBlockPusher.scala:224) at org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferSuccess(RetryingBlockTransferor.java:258) at org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockPushSuccess(RetryingBlockTransferor.java:304) at org.apache.spark.network.shuffle.OneForOneBlockPusher$BlockPushCallback.onSuccess(OneForOneBlockPusher.java:97) at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:197) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.c
[jira] [Created] (SPARK-38856) Fix a rejectedExecutionException error when push-based shuffle is enabled
weixiuli created SPARK-38856: Summary: Fix a rejectedExecutionException error when push-based shuffle is enabled Key: SPARK-38856 URL: https://issues.apache.org/jira/browse/SPARK-38856 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.2.1, 3.2.0 Reporter: weixiuli -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38805) Remove an expired indexFilePath from the ESS shuffleIndexCache or the PBS indexCache to save memory.
weixiuli created SPARK-38805: Summary: Remove an expired indexFilePath from the ESS shuffleIndexCache or the PBS indexCache to save memory. Key: SPARK-38805 URL: https://issues.apache.org/jira/browse/SPARK-38805 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0 Reporter: weixiuli Support to automatically remove an expired indexFilePath from the ESS shuffleIndexCache or the PBS indexCache to save memory. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38683) It is unnecessary to release the ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the client channel's connection
weixiuli created SPARK-38683: Summary: It is unnecessary to release the ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the client channel's connection is terminated Key: SPARK-38683 URL: https://issues.apache.org/jira/browse/SPARK-38683 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0 Reporter: weixiuli It is unnecessary to release the ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the client channel's connection is terminated, to reduce I/O operations and improve performance for the External Shuffle Service. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38555) Avoid contention and get or create clientPools quickly in the TransportClientFactory
weixiuli created SPARK-38555: Summary: Avoid contention and get or create clientPools quickly in the TransportClientFactory Key: SPARK-38555 URL: https://issues.apache.org/jira/browse/SPARK-38555 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0 Reporter: weixiuli -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38428) Check the FetchShuffleBlocks message only once to improve iteration in external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-38428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-38428: - Summary: Check the FetchShuffleBlocks message only once to improve iteration in external shuffle service (was: Improve FetchShuffleBlocks in External shuffle service) > Check the FetchShuffleBlocks message only once to improve iteration in > external shuffle service > > > Key: SPARK-38428 > URL: https://issues.apache.org/jira/browse/SPARK-38428 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: weixiuli >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38428) Improve FetchShuffleBlocks in External shuffle service
weixiuli created SPARK-38428: Summary: Improve FetchShuffleBlocks in External shuffle service Key: SPARK-38428 URL: https://issues.apache.org/jira/browse/SPARK-38428 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0 Reporter: weixiuli -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38344) Avoid to submit task when there are no requests to push up in push-based shuffle
weixiuli created SPARK-38344: Summary: Avoid to submit task when there are no requests to push up in push-based shuffle Key: SPARK-38344 URL: https://issues.apache.org/jira/browse/SPARK-38344 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Affects Versions: 3.2.1, 3.2.0 Reporter: weixiuli -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38280) The Rank windows to be ordered is not necessary in a query
[ https://issues.apache.org/jira/browse/SPARK-38280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-38280: - Summary: The Rank windows to be ordered is not necessary in a query (was: The Rank window to sort is not necessary in a query) > The Rank windows to be ordered is not necessary in a query > -- > > Key: SPARK-38280 > URL: https://issues.apache.org/jira/browse/SPARK-38280 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.2.1 >Reporter: weixiuli >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38280) The Rank window to sort is not necessary in a query
weixiuli created SPARK-38280: Summary: The Rank window to sort is not necessary in a query Key: SPARK-38280 URL: https://issues.apache.org/jira/browse/SPARK-38280 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0 Reporter: weixiuli -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38191) The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol.
weixiuli created SPARK-38191: Summary: The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol. Key: SPARK-38191 URL: https://issues.apache.org/jira/browse/SPARK-38191 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0 Reporter: weixiuli -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38129) Adaptively enable timeout for BroadcastQueryStageExec
[ https://issues.apache.org/jira/browse/SPARK-38129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-38129: - Parent: SPARK-33828 Issue Type: Sub-task (was: Bug) > Adaptively enable timeout for BroadcastQueryStageExec > - > > Key: SPARK-38129 > URL: https://issues.apache.org/jira/browse/SPARK-38129 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.2.1 >Reporter: weixiuli >Priority: Major > Fix For: 3.2.0, 3.2.1 > > > We should disable timeout for BroadcastQueryStageExec when it comes from > shuffle query stages which runtime statistics are usually correct in AQE, but > should enable timeout for it when it comes from others which statistics may > be incorrect, and keep it the same as non-AQE. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38129) Adaptively enable timeout for BroadcastQueryStageExec
[ https://issues.apache.org/jira/browse/SPARK-38129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-38129: - Fix Version/s: 3.2.1 3.2.0 > Adaptively enable timeout for BroadcastQueryStageExec > - > > Key: SPARK-38129 > URL: https://issues.apache.org/jira/browse/SPARK-38129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1 >Reporter: weixiuli >Priority: Major > Fix For: 3.2.0, 3.2.1 > > > We should disable timeout for BroadcastQueryStageExec when it comes from > shuffle query stages which runtime statistics are usually correct in AQE, but > should enable timeout for it when it comes from others which statistics may > be incorrect, and keep it the same as non-AQE. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38129) Adaptively enable timeout for BroadcastQueryStageExec
[ https://issues.apache.org/jira/browse/SPARK-38129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-38129: - Summary: Adaptively enable timeout for BroadcastQueryStageExec (was: Adaptive enable timeout for BroadcastQueryStageExec) > Adaptively enable timeout for BroadcastQueryStageExec > - > > Key: SPARK-38129 > URL: https://issues.apache.org/jira/browse/SPARK-38129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1 >Reporter: weixiuli >Priority: Major > > We should disable timeout for BroadcastQueryStageExec when it comes from > shuffle query stages which runtime statistics are usually correct in AQE, but > should enable timeout for it when it comes from others which statistics may > be incorrect, and keep it the same as non-AQE. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38129) Adaptive enable timeout for BroadcastQueryStageExec
weixiuli created SPARK-38129: Summary: Adaptive enable timeout for BroadcastQueryStageExec Key: SPARK-38129 URL: https://issues.apache.org/jira/browse/SPARK-38129 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1, 3.2.0 Reporter: weixiuli We should disable timeout for BroadcastQueryStageExec when it comes from shuffle query stages which runtime statistics are usually correct in AQE, but should enable timeout for it when it comes from others which statistics may be incorrect, and keep it the same as non-AQE. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38008) Fix the method description of refill
[ https://issues.apache.org/jira/browse/SPARK-38008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-38008: - Description: Fix the method description of refill (was: Fix the description of refill method.) Summary: Fix the method description of refill (was: Fix the description of refill method) > Fix the method description of refill > > > Key: SPARK-38008 > URL: https://issues.apache.org/jira/browse/SPARK-38008 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Fix the method description of refill -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38008) Fix the description of refill method
weixiuli created SPARK-38008: Summary: Fix the description of refill method Key: SPARK-38008 URL: https://issues.apache.org/jira/browse/SPARK-38008 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0 Reporter: weixiuli Fix the description of refill method. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37993) Avoid multiple calls to configuration parameter values
[ https://issues.apache.org/jira/browse/SPARK-37993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37993: - Description: Avoid multiple calls to configuration parameter values (was: Avoid multiple calls to conf parameter values) Summary: Avoid multiple calls to configuration parameter values (was: Avoid multiple calls to conf parameter values) > Avoid multiple calls to configuration parameter values > -- > > Key: SPARK-37993 > URL: https://issues.apache.org/jira/browse/SPARK-37993 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.2, 3.0.3, 3.1.0, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Avoid multiple calls to configuration parameter values -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37993) Avoid multiple calls to conf parameter values
weixiuli created SPARK-37993: Summary: Avoid multiple calls to conf parameter values Key: SPARK-37993 URL: https://issues.apache.org/jira/browse/SPARK-37993 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 3.2.0, 3.1.2, 3.1.0, 3.0.3, 3.0.2, 3.0.0 Reporter: weixiuli Avoid multiple calls to conf parameter values -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.
[ https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37984: - Description: Avoid calculating all outstanding requests to improve performance. (was: Avoid computing all outstanding requests to improve performance.) Summary: Avoid calculating all outstanding requests to improve performance. (was: Avoid computing all outstanding requests to improve performance.) > Avoid calculating all outstanding requests to improve performance. > -- > > Key: SPARK-37984 > URL: https://issues.apache.org/jira/browse/SPARK-37984 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Avoid calculating all outstanding requests to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37984) Avoid computing all outstanding requests to improve performance.
weixiuli created SPARK-37984: Summary: Avoid computing all outstanding requests to improve performance. Key: SPARK-37984 URL: https://issues.apache.org/jira/browse/SPARK-37984 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 3.2.0, 3.1.2, 3.1.0, 3.0.3, 3.0.1, 3.0.0 Reporter: weixiuli Avoid computing all outstanding requests to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37978) Remove the unnecessary ChunkFetchFailureException class
[ https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37978: - Description: The ChunkFetchFailureException is unnecessary and can be replaced by RuntimeException. (was: Remove the useless ChunkFetchFailureException class) > Remove the unnecessary ChunkFetchFailureException class > --- > > Key: SPARK-37978 > URL: https://issues.apache.org/jira/browse/SPARK-37978 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0 >Reporter: weixiuli >Priority: Major > > The ChunkFetchFailureException is unnecessary and can be replaced by > RuntimeException. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37978) Remove the unnecessary ChunkFetchFailureException class
[ https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37978: - Summary: Remove the unnecessary ChunkFetchFailureException class (was: Remove the useless ChunkFetchFailureException class) > Remove the unnecessary ChunkFetchFailureException class > --- > > Key: SPARK-37978 > URL: https://issues.apache.org/jira/browse/SPARK-37978 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0 >Reporter: weixiuli >Priority: Major > > Remove the useless ChunkFetchFailureException class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37978) Remove the useless ChunkFetchFailureException class
weixiuli created SPARK-37978: Summary: Remove the useless ChunkFetchFailureException class Key: SPARK-37978 URL: https://issues.apache.org/jira/browse/SPARK-37978 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 3.2.0, 3.1.1, 3.0.3, 3.0.1, 3.0.0 Reporter: weixiuli Remove the useless ChunkFetchFailureException class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37674) Reduce the output partition of output stage to avoid producing small files.
weixiuli created SPARK-37674: Summary: Reduce the output partition of output stage to avoid producing small files. Key: SPARK-37674 URL: https://issues.apache.org/jira/browse/SPARK-37674 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0, 3.1.1, 3.0.3, 3.0.2, 3.0.0 Reporter: weixiuli The partition size of the finalStage with `DataWritingCommand` or `V2TableWriteExec` may use the ADVISORY_PARTITION_SIZE_IN_BYTES which is smaller one and produce some small files, it may bad for production, we should use a new partition size for the finalStage with `DataWritingCommand` or `V2TableWriteExec` to avoid small files. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37616) Support pushing down a dynamic partition pruning from one join to other joins
weixiuli created SPARK-37616: Summary: Support pushing down a dynamic partition pruning from one join to other joins Key: SPARK-37616 URL: https://issues.apache.org/jira/browse/SPARK-37616 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0, 3.1.2, 3.1.1 Reporter: weixiuli Support pushing down a dynamic partition pruning from one join to other joins -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) Optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Description: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, sometimes the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partitioning pruning rule to avoid inserting unnecessary predicates to improve performance. was: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partitioning pruning rule to avoid inserting unnecessary predicates to improve performance. > Optimize the dynamic partitioning prune rules to avoid inserting unnecessary > predicates to improve performance > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, sometimes the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the dynamic partitioning pruning rule to avoid inserting > unnecessary predicates to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) Optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Description: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partitioning pruning rule to avoid inserting unnecessary predicates to improve performance. was: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance. > Optimize the dynamic partitioning prune rules to avoid inserting unnecessary > predicates to improve performance > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, some time the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the dynamic partitioning pruning rule to avoid inserting > unnecessary predicates to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Summary: optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance (was: Improve the Dynamic partition pruning ) > optimize the dynamic partitioning prune rules to avoid inserting unnecessary > predicates to improve performance > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, some time the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the dynamic partitioning prune rules to avoid inserting > unnecessary predicates to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) Optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Summary: Optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance (was: optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance) > Optimize the dynamic partitioning prune rules to avoid inserting unnecessary > predicates to improve performance > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, some time the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the dynamic partitioning prune rules to avoid inserting > unnecessary predicates to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Description: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partitioning prune rules to avoid inserting unnecessary predicates to improve performance. was: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partition pruning rule and avoid inserting unnecessary predicate to improve the performance. > Improve the Dynamic partition pruning > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, some time the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the dynamic partitioning prune rules to avoid inserting > unnecessary predicates to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Description: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partition pruning rule and avoid inserting unnecessary predicate to improve the performance. was: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partition pruning rule and avoid inserting unnecessary predicate to improve the performance. > Improve the Dynamic partition pruning > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, some time the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the dynamic partition pruning rule and avoid inserting > unnecessary predicate to improve the performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Description: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the dynamic partition pruning rule and avoid inserting unnecessary predicate to improve the performance. was: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the PartitionPruning and avoid inserting unnecessary predicate to improve the performance. > Improve the Dynamic partition pruning > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, some time the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the dynamic partition pruning rule and avoid inserting > unnecessary predicate to improve the performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Description: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the PartitionPruning and avoid inserting unnecessary predicate to improve the performance. was: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the PartitionPruning and avoid insert unnecessary predicate to improve the performance. > Improve the Dynamic partition pruning > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, some time the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the PartitionPruning and avoid inserting unnecessary > predicate to improve the performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37542) Improve the Dynamic partition pruning
[ https://issues.apache.org/jira/browse/SPARK-37542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37542: - Description: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rules of the AQE or non-AQE. We should optimize the PartitionPruning and avoid insert unnecessary predicate to improve the performance. was: Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rule of the AQE rule or non-AQE. We should optimize the PartitionPruning and avoid insert unnecessary predicate to improve the performance. > Improve the Dynamic partition pruning > -- > > Key: SPARK-37542 > URL: https://issues.apache.org/jira/browse/SPARK-37542 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Currently, the dynamic partition pruning rule will insert a predicate on the > filterable table using the filter from the other side of the join and a > custom wrapper called DynamicPruning,and the predicate will be re-optimized > by the AQE or non-AQE. > But, some time the predicate may be unnecessary if the join can NOT reuse > broadcastExchange or it is not benefit,and it will be dropped by the rules of > the AQE or non-AQE. > We should optimize the PartitionPruning and avoid insert unnecessary > predicate to improve the performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37542) Improve the Dynamic partition pruning
weixiuli created SPARK-37542: Summary: Improve the Dynamic partition pruning Key: SPARK-37542 URL: https://issues.apache.org/jira/browse/SPARK-37542 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0 Reporter: weixiuli Currently, the dynamic partition pruning rule will insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning,and the predicate will be re-optimized by the AQE or non-AQE. But, some time the predicate may be unnecessary if the join can NOT reuse broadcastExchange or it is not benefit,and it will be dropped by the rule of the AQE rule or non-AQE. We should optimize the PartitionPruning and avoid insert unnecessary predicate to improve the performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37524) We should drop all tables after testing dynamic partition pruning
weixiuli created SPARK-37524: Summary: We should drop all tables after testing dynamic partition pruning Key: SPARK-37524 URL: https://issues.apache.org/jira/browse/SPARK-37524 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0 Reporter: weixiuli We should drop all tables after testing dynamic partition pruning. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37462) Avoid unnecessary calculating the number of outstanding fetch requests and RPCS
[ https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37462: - Description: It is unnecessary to calculate the number of outstanding fetch requests and RPCS when the IdleStateEvent is not IDLE or the last request is not timeout. (was: To avoid unnecessary calculation of outstanding fetch requests and RPCS) Summary: Avoid unnecessary calculating the number of outstanding fetch requests and RPCS (was: To avoid unnecessary calculation of outstanding fetch requests and RPCS) > Avoid unnecessary calculating the number of outstanding fetch requests and > RPCS > > > Key: SPARK-37462 > URL: https://issues.apache.org/jira/browse/SPARK-37462 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: weixiuli >Priority: Major > > It is unnecessary to calculate the number of outstanding fetch requests and > RPCS when the IdleStateEvent is not IDLE or the last request is not timeout. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37462) To avoid unnecessary calculation of outstanding fetch requests and RPCS
[ https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37462: - Description: To avoid unnecessary calculation of outstanding fetch requests and RPCS (was: To avoid unnecessary flight request calculations) Summary: To avoid unnecessary calculation of outstanding fetch requests and RPCS (was: To avoid unnecessary flight request calculations) > To avoid unnecessary calculation of outstanding fetch requests and RPCS > --- > > Key: SPARK-37462 > URL: https://issues.apache.org/jira/browse/SPARK-37462 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: weixiuli >Priority: Major > > To avoid unnecessary calculation of outstanding fetch requests and RPCS -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37462) To avoid unnecessary flight request calculations
weixiuli created SPARK-37462: Summary: To avoid unnecessary flight request calculations Key: SPARK-37462 URL: https://issues.apache.org/jira/browse/SPARK-37462 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 3.2.0, 3.1.0 Reporter: weixiuli To avoid unnecessary flight request calculations -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem, but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Summary: Add a 'kill' executor link in the Web UI. (was: Add a 'kill' executor link in Web UI.) > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a "kill" link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or it has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a "kill" link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or it has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was:Add a 'kill' executors link in Web UI. > Add a 'kill' executor link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or it has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a "kill" link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Summary: Add a 'kill' executor link in Web UI. (was: Add a 'kill' executors link in Web UI.) > Add a 'kill' executor link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > Add a 'kill' executors link in Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executors link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Summary: Add a 'kill' executors link in Web UI. (was: Add 'kill' executors link in Web UI.) > Add a 'kill' executors link in Web UI. > --- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > Add a 'kill' executors link in Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37028) Add 'kill' executors link in Web UI.
weixiuli created SPARK-37028: Summary: Add 'kill' executors link in Web UI. Key: SPARK-37028 URL: https://issues.apache.org/jira/browse/SPARK-37028 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: weixiuli Add 'kill' executors link in Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add 'kill' executors link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: Add a 'kill' executors link in Web UI. (was: Add 'kill' executors link in Web UI.) > Add 'kill' executors link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > Add a 'kill' executors link in Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36635) spark-sql do NOT support that select name expression as string type now
[ https://issues.apache.org/jira/browse/SPARK-36635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409222#comment-17409222 ] weixiuli commented on SPARK-36635: -- It didn't work before either. > spark-sql do NOT support that select name expression as string type now > > > Key: SPARK-36635 > URL: https://issues.apache.org/jira/browse/SPARK-36635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.2 >Reporter: weixiuli >Priority: Major > > The follow statement would throw an exception. > {code:java} > sql("SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') > people(age, name)") > {code} > {code:java} > // Exception information > Error in query: > mismatched input ''a'' expecting {, ';'}(line 1, pos 14) > == SQL == > SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') > people(age, name) > --^^^ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36635) spark-sql do NOT support that select name expression as string type now
weixiuli created SPARK-36635: Summary: spark-sql do NOT support that select name expression as string type now Key: SPARK-36635 URL: https://issues.apache.org/jira/browse/SPARK-36635 Project: Spark Issue Type: Bug Components: Block Manager Affects Versions: 3.1.2, 3.1.0 Reporter: weixiuli The follow statement would throw an exception. {code:java} sql(SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name)) {code} {code:java} // Exception information Error in query: mismatched input ''a'' expecting {, ';'}(line 1, pos 14) == SQL == SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) --^^^ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35783) Set the list of read columns in the task configuration to reduce reading of ORC data.
[ https://issues.apache.org/jira/browse/SPARK-35783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-35783: - Description: Now, the ORC reader will read all columns of the ORC table when the task configuration does not set the read column list. Therefore, we should set the list of read columns in the task configuration to reduce reading of ORC data. (was: Now, if the read column list is not set in the task configuration, it will read all columns in the ORC table. Therefore, we should set the list of read columns in the task configuration to reduce reading of ORC data.) > Set the list of read columns in the task configuration to reduce reading of > ORC data. > - > > Key: SPARK-35783 > URL: https://issues.apache.org/jira/browse/SPARK-35783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.2 >Reporter: weixiuli >Priority: Major > > Now, the ORC reader will read all columns of the ORC table when the task > configuration does not set the read column list. Therefore, we should set the > list of read columns in the task configuration to reduce reading of ORC data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35783) Set the list of read columns in the task configuration to reduce reading of ORC data.
[ https://issues.apache.org/jira/browse/SPARK-35783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-35783: - Summary: Set the list of read columns in the task configuration to reduce reading of ORC data. (was: Set the list of read columns in the task configuration to reduce read data for ORC) > Set the list of read columns in the task configuration to reduce reading of > ORC data. > - > > Key: SPARK-35783 > URL: https://issues.apache.org/jira/browse/SPARK-35783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.2 >Reporter: weixiuli >Priority: Major > > Now, if the read column list is not set in the task configuration, it will > read all columns in the ORC table. Therefore, we should set the list of read > columns in the task configuration to reduce reading of ORC data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35783) Set the list of read columns in the task configuration to reduce read data for ORC
[ https://issues.apache.org/jira/browse/SPARK-35783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-35783: - Description: Now, if the read column list is not set in the task configuration, it will read all columns in the ORC table. Therefore, we should set the list of read columns in the task configuration to reduce reading of ORC data. (was: Now, if the read column list is not set in the task configuration, it will read all columns in the ORC table. Therefore, we should set the list of read columns in the task configuration to reduce the ORC read data.) > Set the list of read columns in the task configuration to reduce read data > for ORC > -- > > Key: SPARK-35783 > URL: https://issues.apache.org/jira/browse/SPARK-35783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.2 >Reporter: weixiuli >Priority: Major > > Now, if the read column list is not set in the task configuration, it will > read all columns in the ORC table. Therefore, we should set the list of read > columns in the task configuration to reduce reading of ORC data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35783) Set the list of read columns in the task configuration to reduce read data for ORC
[ https://issues.apache.org/jira/browse/SPARK-35783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-35783: - Summary: Set the list of read columns in the task configuration to reduce read data for ORC (was: Set the list of read columns in the task configuration for reducing read data for ORC) > Set the list of read columns in the task configuration to reduce read data > for ORC > -- > > Key: SPARK-35783 > URL: https://issues.apache.org/jira/browse/SPARK-35783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.2 >Reporter: weixiuli >Priority: Major > > Now, if the read column list is not set in the task configuration, it will > read all columns in the ORC table. Therefore, we should set the list of read > columns in the task configuration to reduce the ORC read data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35783) Set the list of read columns in the task configuration for reducing read data for ORC
weixiuli created SPARK-35783: Summary: Set the list of read columns in the task configuration for reducing read data for ORC Key: SPARK-35783 URL: https://issues.apache.org/jira/browse/SPARK-35783 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2, 3.1.0, 3.0.1, 3.0.0 Reporter: weixiuli Now, if the read column list is not set in the task configuration, it will read all columns in the ORC table. Therefore, we should set the list of read columns in the task configuration to reduce the ORC read data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35424) Remove some useless code in ExternalBlockHandler
weixiuli created SPARK-35424: Summary: Remove some useless code in ExternalBlockHandler Key: SPARK-35424 URL: https://issues.apache.org/jira/browse/SPARK-35424 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.1.1, 3.0.2, 3.2.0 Reporter: weixiuli There is some useless code in the ExternalBlockHandler, so we may remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution
[ https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342266#comment-17342266 ] weixiuli commented on SPARK-30186: -- https://github.com/apache/spark/pull/31941 > support Dynamic Partition Pruning in Adaptive Execution > --- > > Key: SPARK-30186 > URL: https://issues.apache.org/jira/browse/SPARK-30186 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiaoju Wu >Priority: Major > > Currently Adaptive Execution cannot work if Dynamic Partition Pruning is > applied. > private def supportAdaptive(plan: SparkPlan): Boolean = { > // TODO migrate dynamic-partition-pruning onto adaptive execution. > sanityCheck(plan) && > !plan.logicalLink.exists(_.isStreaming) && > > *!plan.expressions.exists(_.find(_.isInstanceOf[DynamicPruningSubquery]).isDefined)* > && > plan.children.forall(supportAdaptive) > } > It means we cannot benefit the performance from both AE and DPP. > This ticket is target to make DPP + AE works together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35200) Avoid to recompute the pending speculative tasks in the ExecutorAllocationManager and remove unnecessary code
[ https://issues.apache.org/jira/browse/SPARK-35200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-35200: - Fix Version/s: 3.0.1 3.0.2 3.1.0 3.1.1 > Avoid to recompute the pending speculative tasks in the > ExecutorAllocationManager and remove unnecessary code > - > > Key: SPARK-35200 > URL: https://issues.apache.org/jira/browse/SPARK-35200 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 3.0.2, 3.1.0, 3.1.1 >Reporter: weixiuli >Priority: Major > Fix For: 3.0.1, 3.0.2, 3.1.0, 3.1.1 > > > The number of the pending speculative tasks is recomputed in the > ExecutorAllocationManager to calculate the maximum number of executors > required. While , it only needs to be computed once to improve performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35200) Avoid to recompute the pending speculative tasks in the ExecutorAllocationManager and remove unnecessary code
[ https://issues.apache.org/jira/browse/SPARK-35200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-35200: - Description: The number of the pending speculative tasks is recomputed in the ExecutorAllocationManager to calculate the maximum number of executors required. While , it only needs to be computed once to improve performance. was: The number of the pending speculative tasks is recomputed in the ExecutorAllocationManager to calculate the maximum number of executors required. while , it only needs to be computed once to improve performance. > Avoid to recompute the pending speculative tasks in the > ExecutorAllocationManager and remove unnecessary code > - > > Key: SPARK-35200 > URL: https://issues.apache.org/jira/browse/SPARK-35200 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 3.0.2, 3.1.0, 3.1.1 >Reporter: weixiuli >Priority: Major > > The number of the pending speculative tasks is recomputed in the > ExecutorAllocationManager to calculate the maximum number of executors > required. While , it only needs to be computed once to improve performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35200) Avoid to recompute the pending speculative tasks in the ExecutorAllocationManager and remove unnecessary code
[ https://issues.apache.org/jira/browse/SPARK-35200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-35200: - Summary: Avoid to recompute the pending speculative tasks in the ExecutorAllocationManager and remove unnecessary code (was: Avoid to recompute the pending tasks in the ExecutorAllocationManager and remove unnecessary code) > Avoid to recompute the pending speculative tasks in the > ExecutorAllocationManager and remove unnecessary code > - > > Key: SPARK-35200 > URL: https://issues.apache.org/jira/browse/SPARK-35200 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 3.0.2, 3.1.0, 3.1.1 >Reporter: weixiuli >Priority: Major > > The number of the pending speculative tasks is recomputed in the > ExecutorAllocationManager to calculate the maximum number of executors > required. while , it only needs to be computed once to improve performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35200) Avoid to recompute the pending tasks in the ExecutorAllocationManager and remove unnecessary code
weixiuli created SPARK-35200: Summary: Avoid to recompute the pending tasks in the ExecutorAllocationManager and remove unnecessary code Key: SPARK-35200 URL: https://issues.apache.org/jira/browse/SPARK-35200 Project: Spark Issue Type: Improvement Components: Scheduler, Spark Core Affects Versions: 3.1.1, 3.1.0, 3.0.2 Reporter: weixiuli The number of the pending speculative tasks is recomputed in the ExecutorAllocationManager to calculate the maximum number of executors required. while , it only needs to be computed once to improve performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34834) There is a potential Netty memory leak in TransportResponseHandler.
[ https://issues.apache.org/jira/browse/SPARK-34834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-34834: - Summary: There is a potential Netty memory leak in TransportResponseHandler. (was: There is a potential netty leak in TransportResponseHandler.) > There is a potential Netty memory leak in TransportResponseHandler. > --- > > Key: SPARK-34834 > URL: https://issues.apache.org/jira/browse/SPARK-34834 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.7, 3.0.2, 3.1.0, 3.1.1 >Reporter: weixiuli >Priority: Major > > There is a potential netty leak in TransportResponseHandler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34834) There is a potential Netty memory leak in TransportResponseHandler.
[ https://issues.apache.org/jira/browse/SPARK-34834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-34834: - Description: There is a potential Netty memory leak in TransportResponseHandler. (was: There is a potential netty leak in TransportResponseHandler.) > There is a potential Netty memory leak in TransportResponseHandler. > --- > > Key: SPARK-34834 > URL: https://issues.apache.org/jira/browse/SPARK-34834 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.7, 3.0.2, 3.1.0, 3.1.1 >Reporter: weixiuli >Priority: Major > > There is a potential Netty memory leak in TransportResponseHandler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34834) There is a potential netty leak in TransportResponseHandler.
weixiuli created SPARK-34834: Summary: There is a potential netty leak in TransportResponseHandler. Key: SPARK-34834 URL: https://issues.apache.org/jira/browse/SPARK-34834 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.1.1, 3.1.0, 3.0.2, 2.4.7 Reporter: weixiuli There is a potential netty leak in TransportResponseHandler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33747) Avoid calling unregisterMapOutput when the map stage is being rerunning.
[ https://issues.apache.org/jira/browse/SPARK-33747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-33747: - Description: When a fetch failure happened, DAGScheduler will try to unregister the corresponding map output. The current logic has a race condition that the new map stage attempt is running while the current reduce stage attempt returns another fetch failure (note: the current reduce stage firstly returns a fetch failure to make the maps stage is rerunning, and then the rerunning map stage may return some mapstatus of the failed MapId before the current reduce stage returns another fetch failure at the same MapId, the current reduce is last attempt due to the new map stage is not yet completed). In this case, if the map output is always unregistered, it may actually unregister the map output from the new map stage attempt. (was: When a fetch failure happened, DAGScheduler will try to unregister the corresponding map output. The current logic has a race condition that the new map stage attempt is running while the old reduce stage attempt returns another fetch failure. In this case, if the map output is always unregistered, it may actually unregister the map output from the new map stage attempt.) > Avoid calling unregisterMapOutput when the map stage is being rerunning. > > > Key: SPARK-33747 > URL: https://issues.apache.org/jira/browse/SPARK-33747 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 2.4.5, 3.0.1 >Reporter: weixiuli >Priority: Major > Fix For: 2.4.5, 3.0.1 > > > When a fetch failure happened, DAGScheduler will try to unregister the > corresponding map output. The current logic has a race condition that the new > map stage attempt is running while the current reduce stage attempt returns > another fetch failure (note: the current reduce stage firstly returns a fetch > failure to make the maps stage is rerunning, and then the rerunning map stage > may return some mapstatus of the failed MapId before the current reduce stage > returns another fetch failure at the same MapId, the current reduce is last > attempt due to the new map stage is not yet completed). In this case, if the > map output is always unregistered, it may actually unregister the map output > from the new map stage attempt. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33747) Avoid calling unregisterMapOutput when the map stage is being rerunning.
[ https://issues.apache.org/jira/browse/SPARK-33747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-33747: - Fix Version/s: 2.4.5 3.0.1 > Avoid calling unregisterMapOutput when the map stage is being rerunning. > > > Key: SPARK-33747 > URL: https://issues.apache.org/jira/browse/SPARK-33747 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 2.4.5, 3.0.1 >Reporter: weixiuli >Priority: Major > Fix For: 2.4.5, 3.0.1 > > > When a fetch failure happened, DAGScheduler will try to unregister the > corresponding map output. The current logic has a race condition that the new > map stage attempt is running while the old reduce stage attempt returns > another fetch failure. In this case, if the map output is always > unregistered, it may actually unregister the map output from the new map > stage attempt. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33747) Avoid calling unregisterMapOutput when the map stage is being rerunning.
weixiuli created SPARK-33747: Summary: Avoid calling unregisterMapOutput when the map stage is being rerunning. Key: SPARK-33747 URL: https://issues.apache.org/jira/browse/SPARK-33747 Project: Spark Issue Type: Bug Components: Block Manager Affects Versions: 3.0.1, 2.4.5 Reporter: weixiuli When a fetch failure happened, DAGScheduler will try to unregister the corresponding map output. The current logic has a race condition that the new map stage attempt is running while the old reduce stage attempt returns another fetch failure. In this case, if the map output is always unregistered, it may actually unregister the map output from the new map stage attempt. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32170) Improve the speculation for the inefficient tasks by the task metrics.
[ https://issues.apache.org/jira/browse/SPARK-32170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-32170: - Description: 1) Tasks will be speculated when meet certain conditions no matter they are inefficient or not,this would be a huge waste of cluster resources. 2) In production,the speculation task comes from an efficient one will be killed finally,which is unnecessary and will waste of cluster resources. 3) So, we should evaluate whether the task is inefficient by success tasks metrics firstly, and then decide to speculate it or not. The inefficient task will be speculated and efficient one will not, it is better for the cluster resources. was: 1) Tasks will be speculated when meet certain conditions no matter they are inefficient or not,this would be a huge waste of cluster resources. 2) In production,the speculation task from an efficient one will be killed finally,which is unnecessary and will waste of cluster resources. 3) So, we should evaluate whether the task is inefficient by success tasks metrics firstly, and then decide to speculate it or not. The inefficient task will be speculated and efficient one will not, it is better for the cluster resources. > Improve the speculation for the inefficient tasks by the task metrics. > --- > > Key: SPARK-32170 > URL: https://issues.apache.org/jira/browse/SPARK-32170 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 3.0.0 >Reporter: weixiuli >Priority: Major > Fix For: 3.0.0 > > > 1) Tasks will be speculated when meet certain conditions no matter they are > inefficient or not,this would be a huge waste of cluster resources. > 2) In production,the speculation task comes from an efficient one will be > killed finally,which is unnecessary and will waste of cluster resources. > 3) So, we should evaluate whether the task is inefficient by success tasks > metrics firstly, and then decide to speculate it or not. The inefficient > task will be speculated and efficient one will not, it is better for the > cluster resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32170) Improve the speculation for the inefficient tasks by the task metrics.
[ https://issues.apache.org/jira/browse/SPARK-32170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-32170: - Description: 1) Tasks will be speculated when meet certain conditions no matter they are inefficient or not,this would be a huge waste of cluster resources. 2) In production,the speculation task from an efficient one will be killed finally,which is unnecessary and will waste of cluster resources. 3) So, we should evaluate whether the task is inefficient by success tasks metrics firstly, and then decide to speculate it or not. The inefficient task will be speculated and efficient one will not, it is better for the cluster resources. was: 1) Tasks will be speculated when meet certain conditions no matter they are inefficient or not,this would be a huge waste of cluster resources. 2) In production,the speculation task from an efficient one will be killed finally,which is unnecessary and will waste of cluster resources. 3) So, we should evaluate whether the task is inefficient by success tasks metrics firstly, and then decide to speculate it or not. The inefficient task will be speculated and efficient one will not, it better for the cluster resources. > Improve the speculation for the inefficient tasks by the task metrics. > --- > > Key: SPARK-32170 > URL: https://issues.apache.org/jira/browse/SPARK-32170 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 3.0.0 >Reporter: weixiuli >Priority: Major > Fix For: 3.0.0 > > > 1) Tasks will be speculated when meet certain conditions no matter they are > inefficient or not,this would be a huge waste of cluster resources. > 2) In production,the speculation task from an efficient one will be killed > finally,which is unnecessary and will waste of cluster resources. > 3) So, we should evaluate whether the task is inefficient by success tasks > metrics firstly, and then decide to speculate it or not. The inefficient > task will be speculated and efficient one will not, it is better for the > cluster resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32170) Improve the speculation for the inefficient tasks by the task metrics.
weixiuli created SPARK-32170: Summary: Improve the speculation for the inefficient tasks by the task metrics. Key: SPARK-32170 URL: https://issues.apache.org/jira/browse/SPARK-32170 Project: Spark Issue Type: Improvement Components: Scheduler, Spark Core Affects Versions: 3.0.0 Reporter: weixiuli Fix For: 3.0.0 1) Tasks will be speculated when meet certain conditions no matter they are inefficient or not,this would be a huge waste of cluster resources. 2) In production,the speculation task from an efficient one will be killed finally,which is unnecessary and will waste of cluster resources. 3) So, we should evaluate whether the task is inefficient by success tasks metrics firstly, and then decide to speculate it or not. The inefficient task will be speculated and efficient one will not, it better for the cluster resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963818#comment-16963818 ] weixiuli commented on SPARK-27736: -- https://github.com/apache/spark/pull/26206 > Improve handling of FetchFailures caused by ExternalShuffleService losing > track of executor registrations > - > > Key: SPARK-27736 > URL: https://issues.apache.org/jira/browse/SPARK-27736 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Priority: Minor > > This ticket describes a fault-tolerance edge-case which can cause Spark jobs > to fail if a single external shuffle service process reboots and fails to > recover the list of registered executors (something which can happen when > using YARN if NodeManager recovery is disabled) _and_ the Spark job has a > large number of executors per host. > I believe this problem can be worked around today via a change of > configurations, but I'm filing this issue to (a) better document this > problem, and (b) propose either a change of default configurations or > additional DAGScheduler logic to better handle this failure mode. > h2. Problem description > The external shuffle service process is _mostly_ stateless except for a map > tracking the set of registered applications and executors. > When processing a shuffle fetch request, the shuffle services first checks > whether the requested block ID's executor is registered; if it's not > registered then the shuffle service throws an exception like > {code:java} > java.lang.RuntimeException: Executor is not registered > (appId=application_1557557221330_6891, execId=428){code} > and this exception becomes a {{FetchFailed}} error in the executor requesting > the shuffle block. > In normal operation this error should not occur because executors shouldn't > be mis-routing shuffle fetch requests. However, this _can_ happen if the > shuffle service crashes and restarts, causing it to lose its in-memory > executor registration state. With YARN this state can be recovered from disk > if YARN NodeManager recovery is enabled (using the mechanism added in > SPARK-9439), but I don't believe that we perform state recovery in Standalone > and Mesos modes (see SPARK-24223). > If state cannot be recovered then map outputs cannot be served (even though > the files probably still exist on disk). In theory, this shouldn't cause > Spark jobs to fail because we can always redundantly recompute lost / > unfetchable map outputs. > However, in practice this can cause total job failures in deployments where > the node with the failed shuffle service was running a large number of > executors: by default, the DAGScheduler unregisters map outputs _only from > individual executor whose shuffle blocks could not be fetched_ (see > [code|https://github.com/apache/spark/blame/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1643]), > so it can take several rounds of failed stage attempts to fail and clear > output from all executors on the faulty host. If the number of executors on a > host is greater than the stage retry limit then this can exhaust stage retry > attempts and cause job failures. > This "multiple rounds of recomputation to discover all failed executors on a > host" problem was addressed by SPARK-19753, which added a > {{spark.files.fetchFailure.unRegisterOutputOnHost}} configuration which > promotes executor fetch failures into host-wide fetch failures (clearing > output from all neighboring executors upon a single failure). However, that > configuration is {{false}} by default. > h2. Potential solutions > I have a few ideas about how we can improve this situation: > - Update the [YARN external shuffle service > documentation|https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service] > to recommend enabling node manager recovery. > - Consider defaulting {{spark.files.fetchFailure.unRegisterOutputOnHost}} to > {{true}}. This would improve out-of-the-box resiliency for large clusters. > The trade-off here is a reduction of efficiency in case there are transient > "false positive" fetch failures, but I suspect this case may be unlikely in > practice (so the change of default could be an acceptable trade-off). See > [prior discussion on > GitHub|https://github.com/apache/spark/pull/18150#discussion_r119736751]. > - Modify DAGScheduler to add special-case handling for "Executor is not > registered" exceptions that trigger FetchFailures: if we see this exception > then it implies that the shuffle service failed to recover state, implying > that all of its pri
[jira] [Updated] (SPARK-29551) There is a bug about fetch failed when an executor lost
[ https://issues.apache.org/jira/browse/SPARK-29551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-29551: - Description: There will be a regression when the executor lost and then causes 'fetch failed'. When an executor lost with some reason (eg:. the external shuffle service or host lost on the executor's host ) and the executor loses time happens to be reduce stage fetch failed from it which is really bad, the previous only call mapOutputTracker.unregisterMapOutput(shuffleId, mapIndex, bmAddress) to mark one map as broken in the map stage at this time , but other maps on the executor are also not available which can only be resubmitted by a nest retry stage which is the regression. As we all know that the previous will call mapOutputTracker.removeOutputsOnHost(host) or mapOutputTracker.removeOutputsOnExecutor(execId) when reduce stage fetches failed and the executor is active, while it does NOT for the above problems. So we should distinguish the failedEpoch of 'executor lost' from the fetchFailedEpoch of 'fetch failed' to solve the above problem. We can add an unittest in 'DAGSchedulerSuite.scala' to catch the above problem. {code} test("All shuffle files on the slave should be cleaned up when slave lost test") { // reset the test context with the right shuffle service config afterEach() val conf = new SparkConf() conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true") conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true") init(conf) runEvent(ExecutorAdded("exec-hostA1", "hostA")) runEvent(ExecutorAdded("exec-hostA2", "hostA")) runEvent(ExecutorAdded("exec-hostB", "hostB")) val firstRDD = new MyRDD(sc, 3, Nil) val firstShuffleDep = new ShuffleDependency(firstRDD, new HashPartitioner(3)) val firstShuffleId = firstShuffleDep.shuffleId val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep)) val shuffleDep = new ShuffleDependency(shuffleMapRdd, new HashPartitioner(3)) val secondShuffleId = shuffleDep.shuffleId val reduceRdd = new MyRDD(sc, 1, List(shuffleDep)) submit(reduceRdd, Array(0)) // map stage1 completes successfully, with one task on each executor complete(taskSets(0), Seq( (Success, MapStatus( BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 5)), (Success, MapStatus( BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 6)), (Success, makeMapStatus("hostB", 1, mapTaskId = 7)) )) // map stage2 completes successfully, with one task on each executor complete(taskSets(1), Seq( (Success, MapStatus( BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 8)), (Success, MapStatus( BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 9)), (Success, makeMapStatus("hostB", 1, mapTaskId = 10)) )) // make sure our test setup is correct val initialMapStatus1 = mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus1.count(_ != null) === 3) assert(initialMapStatus1.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7)) val initialMapStatus2 = mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus2.count(_ != null) === 3) assert(initialMapStatus2.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10)) // kill exec-hostA2 runEvent(ExecutorLost("exec-hostA2", ExecutorKilled)) // reduce stage fails with a fetch failure from map stage from exec-hostA2 complete(taskSets(2), Seq( (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345), secondShuffleId, 0L, 0, 0, "ignored"), null) )) // Here is the main assertion -- make sure that we de-register // the map outputs for both map stage from both executors on hostA val mapStatus1 = mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses assert(mapStatus1.count(_ != null) === 1) assert(mapStatus1(2).location.executorId === "exec-hostB") assert(mapStatus1(2).location.host === "hostB") val mapStatus2 = mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses assert(mapStatus2.count(_ != null) === 1) assert(mapStatus2(2).location.executorId === "exec-hostB") assert(mapStatus2(2).location.host === "hostB") } {code} The error output is: {code} 3 did not equal 1 ScalaTestFailureLocation: org.apache.spark.sched
[jira] [Created] (SPARK-29551) There is a bug about fetch failed when an executor lost
weixiuli created SPARK-29551: Summary: There is a bug about fetch failed when an executor lost Key: SPARK-29551 URL: https://issues.apache.org/jira/browse/SPARK-29551 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.3 Reporter: weixiuli Fix For: 2.4.3 There will be a regression when the executor lost and then causes 'fetch failed'. We can add an unittest in 'DAGSchedulerSuite.scala' to catch the above problem. {code} test("All shuffle files on the slave should be cleaned up when slave lost test") { // reset the test context with the right shuffle service config afterEach() val conf = new SparkConf() conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true") conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true") init(conf) runEvent(ExecutorAdded("exec-hostA1", "hostA")) runEvent(ExecutorAdded("exec-hostA2", "hostA")) runEvent(ExecutorAdded("exec-hostB", "hostB")) val firstRDD = new MyRDD(sc, 3, Nil) val firstShuffleDep = new ShuffleDependency(firstRDD, new HashPartitioner(3)) val firstShuffleId = firstShuffleDep.shuffleId val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep)) val shuffleDep = new ShuffleDependency(shuffleMapRdd, new HashPartitioner(3)) val secondShuffleId = shuffleDep.shuffleId val reduceRdd = new MyRDD(sc, 1, List(shuffleDep)) submit(reduceRdd, Array(0)) // map stage1 completes successfully, with one task on each executor complete(taskSets(0), Seq( (Success, MapStatus( BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 5)), (Success, MapStatus( BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 6)), (Success, makeMapStatus("hostB", 1, mapTaskId = 7)) )) // map stage2 completes successfully, with one task on each executor complete(taskSets(1), Seq( (Success, MapStatus( BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 8)), (Success, MapStatus( BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 9)), (Success, makeMapStatus("hostB", 1, mapTaskId = 10)) )) // make sure our test setup is correct val initialMapStatus1 = mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus1.count(_ != null) === 3) assert(initialMapStatus1.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7)) val initialMapStatus2 = mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus2.count(_ != null) === 3) assert(initialMapStatus2.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10)) // kill exec-hostA2 runEvent(ExecutorLost("exec-hostA2", ExecutorKilled)) // reduce stage fails with a fetch failure from map stage from exec-hostA2 complete(taskSets(2), Seq( (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345), secondShuffleId, 0L, 0, 0, "ignored"), null) )) // Here is the main assertion -- make sure that we de-register // the map outputs for both map stage from both executors on hostA val mapStatus1 = mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses assert(mapStatus1.count(_ != null) === 1) assert(mapStatus1(2).location.executorId === "exec-hostB") assert(mapStatus1(2).location.host === "hostB") val mapStatus2 = mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses assert(mapStatus2.count(_ != null) === 1) assert(mapStatus2(2).location.executorId === "exec-hostB") assert(mapStatus2(2).location.host === "hostB") } {code} The error output is: {code} 3 did not equal 1 ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at (DAGSchedulerSuite.scala:609) Expected :1 Actual :3 org.scalatest.exceptions.TestFailedException: 3 did not equal 1 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26288) add initRegisteredExecutorsDB in ExternalShuffleService
[ https://issues.apache.org/jira/browse/SPARK-26288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-26288: - Component/s: Spark Core > add initRegisteredExecutorsDB in ExternalShuffleService > --- > > Key: SPARK-26288 > URL: https://issues.apache.org/jira/browse/SPARK-26288 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Shuffle, Spark Core >Affects Versions: 2.4.0 >Reporter: weixiuli >Priority: Major > > As we all know that spark on Yarn uses DB to record RegisteredExecutors > information which can be reloaded and used again when the > ExternalShuffleService is restarted . > The RegisteredExecutors information can't be recorded both in the mode of > spark's standalone and spark on k8s , which will cause the > RegisteredExecutors information to be lost ,when the ExternalShuffleService > is restarted. > To solve the problem above, a method is proposed and is committed . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26288) add initRegisteredExecutorsDB in ExternalShuffleService
[ https://issues.apache.org/jira/browse/SPARK-26288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-26288: - Description: As we all know that spark on Yarn uses DB to record RegisteredExecutors information which can be reloaded and used again when the ExternalShuffleService is restarted . The RegisteredExecutors information can't be recorded both in the mode of spark's standalone and spark on k8s , which will cause the RegisteredExecutors information to be lost ,when the ExternalShuffleService is restarted. To solve the problem above, a method is proposed and is committed . was: As we all know that spark on Yarn uses DB to record RegisteredExecutors information, when the ExternalShuffleService restart and it can be reloaded, which will be used as well . While neither spark's standalone nor spark on k8s can record it's RegisteredExecutors information by db or others ,so when ExternalShuffleService restart ,which RegisteredExecutors information will be lost,it is't what we looking forward to . This commit add initRegisteredExecutorsDB which can be used either spark standalone or spark on k8s to record RegisteredExecutors information , when the ExternalShuffleService restart and it can be reloaded, which will be used as well . > add initRegisteredExecutorsDB in ExternalShuffleService > --- > > Key: SPARK-26288 > URL: https://issues.apache.org/jira/browse/SPARK-26288 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Shuffle >Affects Versions: 2.4.0 >Reporter: weixiuli >Priority: Major > Fix For: 2.4.0 > > > As we all know that spark on Yarn uses DB to record RegisteredExecutors > information which can be reloaded and used again when the > ExternalShuffleService is restarted . > The RegisteredExecutors information can't be recorded both in the mode of > spark's standalone and spark on k8s , which will cause the > RegisteredExecutors information to be lost ,when the ExternalShuffleService > is restarted. > To solve the problem above, a method is proposed and is committed . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26288) add initRegisteredExecutorsDB in ExternalShuffleService
weixiuli created SPARK-26288: Summary: add initRegisteredExecutorsDB in ExternalShuffleService Key: SPARK-26288 URL: https://issues.apache.org/jira/browse/SPARK-26288 Project: Spark Issue Type: New Feature Components: Kubernetes, Shuffle Affects Versions: 2.4.0 Reporter: weixiuli Fix For: 2.4.0 As we all know that spark on Yarn uses DB to record RegisteredExecutors information, when the ExternalShuffleService restart and it can be reloaded, which will be used as well . While neither spark's standalone nor spark on k8s can record it's RegisteredExecutors information by db or others ,so when ExternalShuffleService restart ,which RegisteredExecutors information will be lost,it is't what we looking forward to . This commit add initRegisteredExecutorsDB which can be used either spark standalone or spark on k8s to record RegisteredExecutors information , when the ExternalShuffleService restart and it can be reloaded, which will be used as well . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org