Dear All, I hope this email finds you well.
I am writing to inform you about a critical bug identified in versions 0.5.0 through 0.5.2 of Celeborn. This bug may lead to data loss under specific conditions. Specifically, if data replication is enabled and the primary partition location is lost, the replicated partition location could be disregarded. To resolve this issue, we strongly recommend updating your Celeborn servers to version 0.5.3 or higher as soon as possible. If an update is not feasible right now, you can prevent this bug by disabling the data replication feature. Additionally, for users of Spark 3.x, there is an important pull request [0] that you need to cherry-pick into your Celeborn client for complete compatibility and functionality. Thank you for your prompt attention to this matter. If you have any questions or require further assistance, please do not hesitate to reach out. Best regards, Ethan Feng [0] https://github.com/apache/celeborn/pull/3070