Hi Ryan, Which curator version did you use ? Is it possibly covered by CURATOR-673[1] (Complete BackgroundCallback if curator got closed or exceptions from no-zookeeper world) ?
[1]: https://issues.apache.org/jira/browse/CURATOR-673 Best, Kezhu Wang On Sat, May 20, 2023 at 12:44 PM tison <wander4...@gmail.com> wrote: > > Yep. > > You're welcome to file a ticket on our JIRA project to share your > reproductive code - https://issues.apache.org/jira/projects/CURATOR > > If you don't have a JIRA account yet, you can self-request at > https://selfserve.apache.org/jira-account.html > > Best, > tison. > > > Ryan Ruel <r.r...@icloud.com> 于2023年5月19日周五 22:38写道: >> >> It will require some effort to get there (without exposing proprietary >> code), but the goal would be reproduction (hopefully via unit test). >> >> Am I correct then in that these conditions are unexpected? >> >> ---- >> Ryan Ruel >> r...@ryanruel.com >> >> On May 19, 2023, at 9:33 AM, tison <wander4...@gmail.com> wrote: >> >> >> Hi Ryan, >> >> Thanks for reporting your use case! >> >> According to your description, it's hard to investigate which part can be >> the root cause of the hanging result. Could you provide a reproduce code >> repo, or share the significant code snippet with related logs? >> >> Best, >> tison. >> >> >> Ryan Ruel <r.r...@icloud.com> 于2023年5月19日周五 19:44写道: >>> >>> I’ve encountered an issue in my application where I don’t seem to be >>> receiving exceptions back from Curator in certain error scenarios. >>> >>> My application: >>> * Has a pool of worker threads working on different jobs which need to >>> read/write ZNodes in ZK. >>> * Utilizes the Curator ModeledFramework to serialize data within the ZNodes. >>> * As this is using ModeledFramework (which is built upon Curator Async), >>> the application uses a Future.get() to wait for Curator to respond with >>> either a success or failure result for each operation. >>> >>> Under heavy load, where the ZK connectivity becomes flakey, I occasionally >>> encounter a case where all my worker threads block on calls to Future.get(). >>> >>> With a connection loss event occurs (or if ZK is just too busy to reply in >>> a timely manner), I’d expect to see exceptions thrown by Curator, but this >>> never happens… the application threads wedge indefinitely. >>> >>> Is the expectation when using the Async APIs that we should always expect a >>> success/failure response? >>> >>> Or is the expectation that the application should implement an additional >>> timer in the event that Curator doesn’t respond? >>> >>> If it’s the former, I can dig further into why Curator is not responding. >>> >>> /Ryan >>> >>> ---- >>> Ryan Ruel >>> r...@ryanruel.com