zhengruifeng opened a new pull request, #40793: URL: https://github.com/apache/spark/pull/40793
### What changes were proposed in this pull request? `TorchDistributorLocalUnitTestsOnConnect` and `TorchDistributorLocalUnitTestsIIOnConnect` were not stable and occasionally got stuck. However, I can not reproduce the issue locally. So they had been disabled. This PR is to reenable them, I found that the old tests for Torch set up the connect sessions in `setUp` and close them in `tearDown`, however such session operations are expensive and we should use `setUpClass` and `tearDownClass` instead. After this change, the related tests seems much stable. So I think the root cause is still related to the resources, since TorchDistributor works on barrier mode, when there is not enough resources in Github Action, the tests just keep waiting. ### Why are the changes needed? for test coverage ### Does this PR introduce _any_ user-facing change? Reenable `TorchDistributorLocalUnitTestsOnConnect` and `TorchDistributorLocalUnitTestsIIOnConnect` ### How was this patch tested? CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
