Nathan,
We have a 1 node (one machine total) Storm cluster based on Storm 0.8.2. We
are using drpc.execute from mule to execute two different topologies (one
after another) with text and a POJO (serialized to a string). We're having an
issue where Storm appears to be rarely and randomly dropping jobs after we hit
the timeout (conf.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 600)), where we'll
get an exception on the client (mule) side producing this:
org.apache.thrift7.TApplicationException: execute failed: unknown result
at
backtype.storm.generated.DistributedRPC$Client.recv_execute(DistributedRPC.java:82)
at
backtype.storm.generated.DistributedRPC$Client.execute(DistributedRPC.java:61)
at backtype.storm.utils.DRPCClient.execute(DRPCClient.java:54)
Storm log analysis of the worker jvms indicate that the job completes just
fine, they just don't come back from the drpc.execute() until the timeout, and
then with the above mentioned non-useful exception .
We are wondering if maybe trying to use Storm 0.9.0.1 would help with this?
Have you seen it before? We looked at the code for DistributedRPC and can't
imagine a message not having a result.
Thank you,
Randy