A bit more information then
There are 4 components
Spout - This is reading from an azure service bus topic/subscription. A
connection is created in the open() method of the spout, nextTuple does a peek
on the message, and invokes the following code;
StringWriter writer = new StringWriter();
IOUtils.copy(message.getBody(), writer);
String messageBody = writer.toString();
It then deletes the message from the queue.
Overall nothing all that exciting..
Bolt 1 - Filtering
Parses the message body (json string) and converts it to an object
representation. Filters out anything that isn't a monetise message. It then
emits the monetise message object to the next bolt. Monetise messages account
for ~ 0.03% of the total message volume.
Bolt 2 - transformation
Basically extracts from the monetise object the values that are interesting and
contracts a string which it emits
Bolt 3 - Storage
Stores the transformed string in Redis using the current date/time as key.
-----
Shuffle grouping is used with the topology
I ack every tuple irrespective of whether I emit the tuple or not. It should
not be attempting to replay tuple.
-----
I don't think Bolt 2/3 are the cause of the bottleneck. They don't have to
process much data at all tbh.
I can accept that perhaps there is something inefficient with the spout,
perhaps it just can't read from the service bus quickly enough. I will do some
more research on this and have a chat with the colleague who wrote this
component.
I suppose I'm just trying to identify if I've configured something incorrectly
with respect to storm, whether I'm correct to relate the total number of
executors and tasks to the total number of cores I have available. I find it
strange that I get a better throughput when I choose an arbitrary large number
for the parallelism hint than if I constrain myself to a maximum that equates
to the number of cores.
D
From: Nathan Leung<mailto:[email protected]>
Sent: Tuesday, 18 March 2014 18:38
To: [email protected]<mailto:[email protected]>
In my experience storm is able to make good use of CPU resources, if the
application is written appropriately. You shouldn't require too much executor
parallelism if your application is CPU intensive. If your bolts are doing
things like remote DB/NoSQL accesses, then that changes things and
parallelizing bolts will give you more throughput. Not knowing your
application, the best way to pin down the problem is to simplify your topology.
Cut out everything except for the Spout. How is your filtering done? if you
return without emitting, the latest versions of storm will sleep before trying
again. It may be worthwhile to loop in the spout until you receive a valid
message, or the bus is empty. How much throughput can you achieve from the
spout, emitting a tuple into the ether? Maybe the problem is your message bus.
Once you have achieve a level of performance you are satisfied from the spout,
add one bolt. What bottlenecks does the bolt introduce? etc etc.
On Tue, Mar 18, 2014 at 2:31 PM, David Crossland
<[email protected]<mailto:[email protected]>> wrote:
Could my issue relate to memory allocated to the JVM? Most of the setting are
pretty much the defaults. Are there any other settings that could be
throttling the topology?
I'd like to be able to identify the issue without all this constant “stabbing
in the dark”… 😃
D
From: David Crossland<mailto:[email protected]>
Sent: Tuesday, 18 March 2014 16:32
To: [email protected]<mailto:[email protected]>
Being very new to storm I'm not sure what to expect in some regards.
Ive been playing about with the number of workers/executors/tasks trying to
improve throughput on my cluster. I have a 3 nodes, two 4 core and a 2 core
node (I can't increase the 3rd node to a medium until the customer gets more
cores..). There is a spout that reads from a message bus and a bolt that
filter out all but the messages we are interested in processing downstream in
the topology. Most messages are filtered out.
I'm assuming that these two components require the most resources as they
should be reading/filtering messages at a constant rate, there are two further
bolts that are invoked intermittently and hence require less.
Ive set the number of workers to 12 (in fact I've noticed it rarely seems to
make much difference if I set this to 3/6/9 or 12, there is marginal
improvement the higher the value).
The spout and filtering bolt I've tried a number of values for in the
parallelism hint (I started with 1/4/8/16/20/30/40/60/128/256… ) and I can
barely get the throughput to exceed 3500 messages per minute. And the larger
hints just grind the system to a halt. Currently I have them both set to 20.
The strange thing is that no matter what I do the CPU load is very tiny and is
typically 80-90% idle. Suggesting that the topology isn't doing that much
work. And tbh I've no idea why this is. Can anyone offer any suggestions?
If I understand how this should work, given the number of cores I would think
the total number of executors should total 10? Spawning 1 thread per node, I
can then set a number of tasks to say 2/4/8 per thread (I've no idea which
would be most efficient..). Ive tried something along these lines and my
throughput was significantly less, approx. 2000 messages per minute. In fact
parallelism hint of 20/24/30 seem to have produced the most throughput.
All help/suggestions gratefully received!
Thanks
David