Hi all,

I'd like to collect opinions on our Storm / DRPC implementation to determine 
whether we're on the right track.

We have 1 "Dispatch" Storm topology, currently consuming data via DRPC, about 
20 gig/day in approx 90,000 requests. 
We don't need to return anything to the data producer. We changed DRPCSpout to 
immediately return "ack" after receiving a call to reduce client block time.

So, this Dispatch Storm topology parses the input data: a big block of 
encrypted text with multiple lines: it decrypts it and splits it into separate 
lines. This Dispatch topology sends it back to the DRPC server as separate 
calls with a smaller payload, with a specific DRPC function depending on which 
topology needs to process it further (this is determined based on the content 
of each line). 1 of the DRPC calls described earlier is maybe 100 smaller DRPC 
calls in this step.

Currently 4 "Processing" topologies query the DRPC server with a specific function name 
and process the line-based data (again, the DRPC response is immediately given as "ack").


It kind of feels like we're mis-using DRPC as if it's a message queue and that 
we are better off switching to something like Kafka. I'm afraid the DRPCClient 
in the Dispatch topology is blocking until the processing topology picks it up. 
But so far it seems to work okay. I'm worried of higher loads which we expect 
in the future. Interested in opinions.


Kind regards,
Jori

Reply via email to