xkrogen opened a new pull request #35358:
URL: https://github.com/apache/spark/pull/35358


   ### What changes were proposed in this pull request?
   This updates `BlockManagerDecommissioner` to avoid treating "remote", the 
placeholder hostname used by `FALLBACK_BLOCK_MANAGER_ID`, as a valid hostname 
and attempting to perform a network transfer to it. If the `peer` it encounters 
matches the fallback block manager ID, it now goes directly to accessing 
`fallbackStorage`, instead of first attempting to treat it like a valid block 
manager ID.
   
   In addition, this reverts the changes from SPARK-37318, which should no 
longer be necessary now that the underlying issue is resolved.
   
   ### Why are the changes needed?
   See SPARK-38062 for a much more detailed explanation. The gist of it is that:
   - Attempting to resolve "remote" can behave unexpectedly in some DNS 
environments. This can cause failures of the `FallbackStorageSuite` tests, but 
also could potentially cause issues in a production deployment.
   - SPARK-37318 "fixes" the tests by skipping them if such a DNS environment 
is detected, but this has the obvious drawback of disabling the tests, and 
doesn't address the problem for production environments.
   - Even if resolving "remote" does quickly fail, as the current code expects, 
it is semantically wrong -- we should not treat this placeholder as a valid 
hostname.
   
   ### Does this PR introduce _any_ user-facing change?
   `FallbackStorage` may be resolved slightly quicker, as it removes an 
unnecessary lookup step, but it should be negligible in most environments. No 
other user-facing changes.
   
   ### How was this patch tested?
   The DNS environment in which unit tests are run in an automated fashion at 
my company means that we experience an issue very similar to what is described 
in SPARK-37318. Without this patch, tests in `FallbackStorageSuite` 
consistently fail, exceeding their timeouts. With this patch, the tests 
consistently (and quickly!) succeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to