Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-04-05 Thread via GitHub


maheshk114 commented on PR #45836:
URL: https://github.com/apache/spark/pull/45836#issuecomment-2039261974

   @attilapiros Can you please take a look at this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-14 Thread via GitHub


github-actions[bot] commented on PR #45836:
URL: https://github.com/apache/spark/pull/45836#issuecomment-2227537097

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-15 Thread via GitHub


github-actions[bot] closed pull request #45836: [SPARK-47702][CORE] Remove 
Shuffle service endpoint from the locations list when RDD block is removed form 
a node.
URL: https://github.com/apache/spark/pull/45836


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-31 Thread via GitHub


attilapiros commented on PR #45836:
URL: https://github.com/apache/spark/pull/45836#issuecomment-2261675147

   @maheshk114 I am sorry I missed this PR earlier. Let me try to reopen it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-31 Thread via GitHub


attilapiros commented on PR #45836:
URL: https://github.com/apache/spark/pull/45836#issuecomment-2261700784

   retest


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-31 Thread via GitHub


attilapiros commented on code in PR #45836:
URL: https://github.com/apache/spark/pull/45836#discussion_r1699239133


##
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##
@@ -2112,8 +2112,10 @@ private[spark] class BlockManager(
   hasRemoveBlock = true
   if (tellMaster) {
 // Only update storage level from the captured block status before 
deleting, so that
-// memory size and disk size are being kept for calculating delta.
-reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = 
StorageLevel.NONE))
+// memory size and disk size are being kept for calculating delta. 
Reset the replica

Review Comment:
   ```suggestion
   // memory size and disk size are being kept for calculating delta. 
Reset the replication
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-31 Thread via GitHub


attilapiros commented on code in PR #45836:
URL: https://github.com/apache/spark/pull/45836#discussion_r1699239133


##
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##
@@ -2112,8 +2112,10 @@ private[spark] class BlockManager(
   hasRemoveBlock = true
   if (tellMaster) {
 // Only update storage level from the captured block status before 
deleting, so that
-// memory size and disk size are being kept for calculating delta.
-reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = 
StorageLevel.NONE))
+// memory size and disk size are being kept for calculating delta. 
Reset the replica

Review Comment:
   ```suggestion
   // memory size and disk size are being kept for calculating delta. 
Reset the replication
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-31 Thread via GitHub


github-actions[bot] closed pull request #45836: [SPARK-47702][CORE] Remove 
Shuffle service endpoint from the locations list when RDD block is removed form 
a node.
URL: https://github.com/apache/spark/pull/45836


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-31 Thread via GitHub


attilapiros commented on code in PR #45836:
URL: https://github.com/apache/spark/pull/45836#discussion_r1699239788


##
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##
@@ -2112,8 +2112,10 @@ private[spark] class BlockManager(
   hasRemoveBlock = true
   if (tellMaster) {
 // Only update storage level from the captured block status before 
deleting, so that
-// memory size and disk size are being kept for calculating delta.
-reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = 
StorageLevel.NONE))
+// memory size and disk size are being kept for calculating delta. 
Reset the replica
+// count 0 in storage level to notify that its a remove operation.

Review Comment:
   ```suggestion
   // memory size and disk size are being kept for calculating delta. 
Reset the replication
   // factor to 0 to indicate its a remove operation.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-31 Thread via GitHub


attilapiros commented on PR #45836:
URL: https://github.com/apache/spark/pull/45836#issuecomment-2261706525

   @maheshk114 could you please open a new PR with the same content?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-31 Thread via GitHub


mridulm commented on PR #45836:
URL: https://github.com/apache/spark/pull/45836#issuecomment-2262037090

   I think it was the stale tag which caused it to get closed again 
@attilapiros. Removed it, let us see if it stays open :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-08-01 Thread via GitHub


attilapiros commented on code in PR #45836:
URL: https://github.com/apache/spark/pull/45836#discussion_r1699239788


##
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##
@@ -2112,8 +2112,10 @@ private[spark] class BlockManager(
   hasRemoveBlock = true
   if (tellMaster) {
 // Only update storage level from the captured block status before 
deleting, so that
-// memory size and disk size are being kept for calculating delta.
-reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = 
StorageLevel.NONE))
+// memory size and disk size are being kept for calculating delta. 
Reset the replica
+// count 0 in storage level to notify that its a remove operation.

Review Comment:
   ```suggestion
   // memory size and disk size are being kept for calculating delta, 
but reset the replication
   // factor to 0 to indicate its a remove operation.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-08-01 Thread via GitHub


attilapiros commented on code in PR #45836:
URL: https://github.com/apache/spark/pull/45836#discussion_r1699239788


##
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##
@@ -2112,8 +2112,10 @@ private[spark] class BlockManager(
   hasRemoveBlock = true
   if (tellMaster) {
 // Only update storage level from the captured block status before 
deleting, so that
-// memory size and disk size are being kept for calculating delta.
-reportBlockStatus(blockId, blockStatus.get.copy(storageLevel = 
StorageLevel.NONE))
+// memory size and disk size are being kept for calculating delta. 
Reset the replica
+// count 0 in storage level to notify that its a remove operation.

Review Comment:
   ```suggestion
   // memory size and disk size are being kept for calculating delta 
but reset the replication
   // factor to 0 to indicate its a remove operation.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-08-02 Thread via GitHub


attilapiros commented on code in PR #45836:
URL: https://github.com/apache/spark/pull/45836#discussion_r1702385601


##
core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala:
##
@@ -296,6 +306,26 @@ trait BlockManagerReplicationBehavior extends SparkFunSuite
 }
   }
 
+  test("Test block location after replication with 
SHUFFLE_SERVICE_FETCH_RDD_ENABLED enabled") {
+val shuffleServicePort = conf.get(SHUFFLE_SERVICE_PORT)
+val store1 = makeBlockManager(1, "host-1")
+val store2 = makeBlockManager(1, "host-2")
+assert(master.getPeers(store1.blockManagerId).toSet === 
Set(store2.blockManagerId))
+
+val blockId = RDDBlockId(1, 2)
+val message = new Array[Byte](1000)
+
+// if SHUFFLE_SERVICE_FETCH_RDD_ENABLED is enabled, then shuffle port 
should be present.
+store1.putSingle(blockId, message, StorageLevel.DISK_ONLY)
+assert(master.getLocations(blockId).contains(
+  BlockManagerId("host-1", "localhost", shuffleServicePort, None)))
+
+// after block is removed, shuffle port should be removed.

Review Comment:
   ```suggestion
   // After block is removed the external shuffle service should be removed 
too from the locations
   ```



##
core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala:
##
@@ -296,6 +306,26 @@ trait BlockManagerReplicationBehavior extends SparkFunSuite
 }
   }
 
+  test("Test block location after replication with 
SHUFFLE_SERVICE_FETCH_RDD_ENABLED enabled") {
+val shuffleServicePort = conf.get(SHUFFLE_SERVICE_PORT)
+val store1 = makeBlockManager(1, "host-1")
+val store2 = makeBlockManager(1, "host-2")
+assert(master.getPeers(store1.blockManagerId).toSet === 
Set(store2.blockManagerId))

Review Comment:
   We do not need this.



##
core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala:
##
@@ -296,6 +306,26 @@ trait BlockManagerReplicationBehavior extends SparkFunSuite
 }
   }
 
+  test("Test block location after replication with 
SHUFFLE_SERVICE_FETCH_RDD_ENABLED enabled") {
+val shuffleServicePort = conf.get(SHUFFLE_SERVICE_PORT)
+val store1 = makeBlockManager(1, "host-1")
+val store2 = makeBlockManager(1, "host-2")
+assert(master.getPeers(store1.blockManagerId).toSet === 
Set(store2.blockManagerId))
+
+val blockId = RDDBlockId(1, 2)
+val message = new Array[Byte](1000)
+
+// if SHUFFLE_SERVICE_FETCH_RDD_ENABLED is enabled, then shuffle port 
should be present.

Review Comment:
   ```suggestion
   // As SHUFFLE_SERVICE_FETCH_RDD_ENABLED is enabled the external shuffle 
service should be listed
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-08-15 Thread via GitHub


attilapiros closed pull request #45836: [SPARK-47702][CORE] Remove Shuffle 
service endpoint from the locations list when RDD block is removed form a node.
URL: https://github.com/apache/spark/pull/45836


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-08-15 Thread via GitHub


attilapiros commented on PR #47779:
URL: https://github.com/apache/spark/pull/47779#issuecomment-2292371405

   This is an updated version of https://github.com/apache/spark/pull/45836
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-08-15 Thread via GitHub


attilapiros commented on PR #47779:
URL: https://github.com/apache/spark/pull/47779#issuecomment-2292373648

   cc @mridulm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org