| It will require some effort to get there (without exposing proprietary code), but the goal would be reproduction (hopefully via unit test).
Am I correct then in that these conditions are unexpected? Hi Ryan,
Thanks for reporting your use case!
According to your description, it's hard to investigate which part can be the root cause of the hanging result. Could you provide a reproduce code repo, or share the significant code snippet with related logs?
I’ve encountered an issue in my application where I don’t seem to be receiving exceptions back from Curator in certain error scenarios.
My application:
* Has a pool of worker threads working on different jobs which need to read/write ZNodes in ZK.
* Utilizes the Curator ModeledFramework to serialize data within the ZNodes.
* As this is using ModeledFramework (which is built upon Curator Async), the application uses a Future.get() to wait for Curator to respond with either a success or failure result for each operation.
Under heavy load, where the ZK connectivity becomes flakey, I occasionally encounter a case where all my worker threads block on calls to Future.get().
With a connection loss event occurs (or if ZK is just too busy to reply in a timely manner), I’d expect to see exceptions thrown by Curator, but this never happens… the application threads wedge indefinitely.
Is the expectation when using the Async APIs that we should always expect a success/failure response?
Or is the expectation that the application should implement an additional timer in the event that Curator doesn’t respond?
If it’s the former, I can dig further into why Curator is not responding.
/Ryan
----
Ryan Ruel
[email protected]
|