Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
maheshk114 commented on PR #45836: URL: https://github.com/apache/spark/pull/45836#issuecomment-2039261974 @attilapiros Can you please take a look at this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
github-actions[bot] commented on PR #45836: URL: https://github.com/apache/spark/pull/45836#issuecomment-2227537097 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
github-actions[bot] closed pull request #45836: [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. URL: https://github.com/apache/spark/pull/45836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on PR #45836: URL: https://github.com/apache/spark/pull/45836#issuecomment-2261675147 @maheshk114 I am sorry I missed this PR earlier. Let me try to reopen it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on PR #45836: URL: https://github.com/apache/spark/pull/45836#issuecomment-2261700784 retest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on code in PR #45836: URL: https://github.com/apache/spark/pull/45836#discussion_r1699239133 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -2112,8 +2112,10 @@ private[spark] class BlockManager( hasRemoveBlock = true if (tellMaster) { // Only update storage level from the captured block status before deleting, so that -// memory size and disk size are being kept for calculating delta. -reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = StorageLevel.NONE)) +// memory size and disk size are being kept for calculating delta. Reset the replica Review Comment: ```suggestion // memory size and disk size are being kept for calculating delta. Reset the replication ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on code in PR #45836: URL: https://github.com/apache/spark/pull/45836#discussion_r1699239133 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -2112,8 +2112,10 @@ private[spark] class BlockManager( hasRemoveBlock = true if (tellMaster) { // Only update storage level from the captured block status before deleting, so that -// memory size and disk size are being kept for calculating delta. -reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = StorageLevel.NONE)) +// memory size and disk size are being kept for calculating delta. Reset the replica Review Comment: ```suggestion // memory size and disk size are being kept for calculating delta. Reset the replication ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
github-actions[bot] closed pull request #45836: [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. URL: https://github.com/apache/spark/pull/45836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on code in PR #45836: URL: https://github.com/apache/spark/pull/45836#discussion_r1699239788 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -2112,8 +2112,10 @@ private[spark] class BlockManager( hasRemoveBlock = true if (tellMaster) { // Only update storage level from the captured block status before deleting, so that -// memory size and disk size are being kept for calculating delta. -reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = StorageLevel.NONE)) +// memory size and disk size are being kept for calculating delta. Reset the replica +// count 0 in storage level to notify that its a remove operation. Review Comment: ```suggestion // memory size and disk size are being kept for calculating delta. Reset the replication // factor to 0 to indicate its a remove operation. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on PR #45836: URL: https://github.com/apache/spark/pull/45836#issuecomment-2261706525 @maheshk114 could you please open a new PR with the same content? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
mridulm commented on PR #45836: URL: https://github.com/apache/spark/pull/45836#issuecomment-2262037090 I think it was the stale tag which caused it to get closed again @attilapiros. Removed it, let us see if it stays open :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on code in PR #45836: URL: https://github.com/apache/spark/pull/45836#discussion_r1699239788 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -2112,8 +2112,10 @@ private[spark] class BlockManager( hasRemoveBlock = true if (tellMaster) { // Only update storage level from the captured block status before deleting, so that -// memory size and disk size are being kept for calculating delta. -reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = StorageLevel.NONE)) +// memory size and disk size are being kept for calculating delta. Reset the replica +// count 0 in storage level to notify that its a remove operation. Review Comment: ```suggestion // memory size and disk size are being kept for calculating delta, but reset the replication // factor to 0 to indicate its a remove operation. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on code in PR #45836: URL: https://github.com/apache/spark/pull/45836#discussion_r1699239788 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -2112,8 +2112,10 @@ private[spark] class BlockManager( hasRemoveBlock = true if (tellMaster) { // Only update storage level from the captured block status before deleting, so that -// memory size and disk size are being kept for calculating delta. -reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = StorageLevel.NONE)) +// memory size and disk size are being kept for calculating delta. Reset the replica +// count 0 in storage level to notify that its a remove operation. Review Comment: ```suggestion // memory size and disk size are being kept for calculating delta but reset the replication // factor to 0 to indicate its a remove operation. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on code in PR #45836: URL: https://github.com/apache/spark/pull/45836#discussion_r1702385601 ## core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala: ## @@ -296,6 +306,26 @@ trait BlockManagerReplicationBehavior extends SparkFunSuite } } + test("Test block location after replication with SHUFFLE_SERVICE_FETCH_RDD_ENABLED enabled") { +val shuffleServicePort = conf.get(SHUFFLE_SERVICE_PORT) +val store1 = makeBlockManager(1, "host-1") +val store2 = makeBlockManager(1, "host-2") +assert(master.getPeers(store1.blockManagerId).toSet === Set(store2.blockManagerId)) + +val blockId = RDDBlockId(1, 2) +val message = new Array[Byte](1000) + +// if SHUFFLE_SERVICE_FETCH_RDD_ENABLED is enabled, then shuffle port should be present. +store1.putSingle(blockId, message, StorageLevel.DISK_ONLY) +assert(master.getLocations(blockId).contains( + BlockManagerId("host-1", "localhost", shuffleServicePort, None))) + +// after block is removed, shuffle port should be removed. Review Comment: ```suggestion // After block is removed the external shuffle service should be removed too from the locations ``` ## core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala: ## @@ -296,6 +306,26 @@ trait BlockManagerReplicationBehavior extends SparkFunSuite } } + test("Test block location after replication with SHUFFLE_SERVICE_FETCH_RDD_ENABLED enabled") { +val shuffleServicePort = conf.get(SHUFFLE_SERVICE_PORT) +val store1 = makeBlockManager(1, "host-1") +val store2 = makeBlockManager(1, "host-2") +assert(master.getPeers(store1.blockManagerId).toSet === Set(store2.blockManagerId)) Review Comment: We do not need this. ## core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala: ## @@ -296,6 +306,26 @@ trait BlockManagerReplicationBehavior extends SparkFunSuite } } + test("Test block location after replication with SHUFFLE_SERVICE_FETCH_RDD_ENABLED enabled") { +val shuffleServicePort = conf.get(SHUFFLE_SERVICE_PORT) +val store1 = makeBlockManager(1, "host-1") +val store2 = makeBlockManager(1, "host-2") +assert(master.getPeers(store1.blockManagerId).toSet === Set(store2.blockManagerId)) + +val blockId = RDDBlockId(1, 2) +val message = new Array[Byte](1000) + +// if SHUFFLE_SERVICE_FETCH_RDD_ENABLED is enabled, then shuffle port should be present. Review Comment: ```suggestion // As SHUFFLE_SERVICE_FETCH_RDD_ENABLED is enabled the external shuffle service should be listed ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros closed pull request #45836: [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. URL: https://github.com/apache/spark/pull/45836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on PR #47779: URL: https://github.com/apache/spark/pull/47779#issuecomment-2292371405 This is an updated version of https://github.com/apache/spark/pull/45836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]
attilapiros commented on PR #47779: URL: https://github.com/apache/spark/pull/47779#issuecomment-2292373648 cc @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org