prakharjain09 opened a new pull request #27539: [SPARK-30786] [BLOCK MANAGER] 
Fix Block replication failure propogation issue in BlockManager
URL: https://github.com/apache/spark/pull/27539
 
 
   What changes were proposed in this pull request?
   Currently the uploadBlockSync api in BlockTransferService always succeeds 
irrespective of whether the BlockManager was able to successfully replicate a 
block on peer block manager or not. This PR makes sure that the 
NettyBlockRpcServer invokes onFailure callback when it is not able to replicate 
the block to itself because of any reason. The onFailure callback makes sure 
that the BlockTransferService on client side gets the failure and retry 
replication the Block on some other BlockManager. 
   
   Why are the changes needed?
   Currently the Spark Block replication retry logic is not working correctly. 
It doesn't retry on other Block managers even when replication fails on 1 of 
the peers.
   
   A user can cache an DataFrame with different replication factor. Ex - 
df.persist(StorageLevel.MEMORY_ONLY_2) - This will cache each partition at two 
different BlockManagers. When a DataFrame partition is computed first time, it 
is firstly stored locally on the local BlockManager and then it is replicated 
to other block managers based on replication factor config. The replication of 
block to other block managers might fail because of memory/network etc issues 
and so there is already provision to retry the replication on some other peer 
based on "spark.storage.maxReplicationFailures" config, Currently when this 
replication fails, the client does not know about the failure and so it doesn't 
retry on other peers. This PR fixes this issue.
   
   
   Does this PR introduce any user-facing change?
   No.
   
   How was this patch tested?
   Added Unit Test.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to