[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-543456393 @maryannxue 1. [#25295 (comment)](https://github.com/apache/spark/pull/25295#discussion_r334792481): this comment has been resolved in [commit](https://github.com/apache/spark/commit/51f10ed90f6b28c58fa1e576c8ceaa22e8c5f5ba), which let `BlockStoreShuffleReadershould` take `blocksByAddress` directly instead of a map id. 2. I will resolve [#25295 (comment)](https://github.com/apache/spark/pull/25295#discussion_r336019058) and `test("Exchange reuse")` can prove that "query stage reuse still working in presence of local shuffle reader". I will add some small updated in `test("Exchange reuse")` . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-541502159 @cloud-fan Can you help review the updated patch? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-539301442 Resolve the conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-532552503 @cloud-fan Move the rule of converting the shuffle reader to local shuffle reader before `ReduceNumShufflePartitions`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-530640464 @cloud-fan The specific `ShuffleRDD` is implemented by reading the whole data from one mapper output locally to ensure there is no data transferred from the network. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-530319218 @cloud-fan Thanks for you reviews! When the shuffle blocks exist locally, the shuffle service already read the blocks locally even through shuffle service in [ShuffleBlockFetcherIterator](https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala#L100), I think. Correct me if wrong understanding! If so, whether need to optimize it to locally read? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-530202694 @cloud-fan Can you help review if you have available time? Thanks for your help very much. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-530202124 fixed the conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-518092402 We have done the functionality and performance tests in 3TB TPC-DS. And the result is shown in [here](https://docs.google.com/spreadsheets/d/1jtT3tCiNjtUbjOelpf50w7Z5JNl2YhzrBnbpF-7EhTw/edit#gid=0). Q82 can show 1.76x performance improvement with this PR. And no queries have significant performance degradation. @carsonwang @@cloud-fan can you help review if you have available time? Thanks for your help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org