Hey Helix folks, We ran into a fun issue recently. Between the time that Apache Helix v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it looks like a backward-incompatible change may have been introduced on June 3rd that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 ( https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so obviously that certainly factors in, but it's what our organizational team is supporting. So unfortunately we're stuck between a rock and a hard place at the moment: - We can't go back to v1.0.2 because it lacks the Log4j fixes - We can't use v1.0.3 due to the corruption issue - We can't move ahead to v1.0.4 due to the compatibility issue with Zookeeper I have a fork we were previously using ( https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1), but that's not a long-term solution either. The issue is a bit subtle. From v1.0.2 to v1.0.3, the org.apache.zookeeper version requirement in the helix/zookeeper-api was bumped from 3.14.13 to 3.5.9: - v1.0.2: https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58 - v1.0.3: https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54 So that, in and of itself, was not breaking. And then from v1.0.3 to v1.0.4, some code changes were introduced in this PR (https://github.com/apache/helix/pull/2138/files) that relied specifically on that 3.5.x Zookeeper version. For example, the "import org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java" in that PR introduces a backward incompatible change. So the net result is that, unfortunately, there has been a drift over the past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper 3.4.x clusters incompatible with Apache Helix. I wanted to post this here: 1. To see if you were all aware of it (since it may hit other customers as well and we were a bit blind-sided by it) 2. To see if you had any ideas on how to work with/around this Our long-term plan will obviously be to get on newer Zookeeper clusters as we can, but that's likely not going to be a quick turn-around for us. In the short-term we'll need to revert back to our v1.0.2 fork. Does the team happen to have any other comments or suggestions on dealing with this issue? Is this correctable at the project level (I suspect that will be tough)? Thanks much! ~Brent
