It could be bolt 3. What is the latency like between your worker and your redis server? Increasing the number of threads for bolt 3 will likely increase your throughput. Bolt 1 and 2 are probably CPU bound, but bolt 3 is probably restricted by your network access. Also I've found that localOrShuffleGrouping can improve performance due to reduced network communications.
On Tue, Mar 18, 2014 at 3:55 PM, David Crossland <[email protected]>wrote: > A bit more information then > > There are 4 components > > Spout - This is reading from an azure service bus topic/subscription. A > connection is created in the open() method of the spout, nextTuple does a > peek on the message, and invokes the following code; > > StringWriter writer = new StringWriter(); > IOUtils.copy(message.getBody(), writer); > String messageBody = writer.toString(); > > It then deletes the message from the queue. > > Overall nothing all that exciting.. > > Bolt 1 - Filtering > > Parses the message body (json string) and converts it to an object > representation. Filters out anything that isn't a monetise message. It > then emits the monetise message object to the next bolt. Monetise messages > account for ~ 0.03% of the total message volume. > > Bolt 2 - transformation > > Basically extracts from the monetise object the values that are > interesting and contracts a string which it emits > > Bolt 3 - Storage > > Stores the transformed string in Redis using the current date/time as > key. > > ----- > > Shuffle grouping is used with the topology > > I ack every tuple irrespective of whether I emit the tuple or not. It > should not be attempting to replay tuple. > > ----- > > I don't think Bolt 2/3 are the cause of the bottleneck. They don't have > to process much data at all tbh. > > I can accept that perhaps there is something inefficient with the spout, > perhaps it just can't read from the service bus quickly enough. I will do > some more research on this and have a chat with the colleague who wrote > this component. > > I suppose I'm just trying to identify if I've configured something > incorrectly with respect to storm, whether I'm correct to relate the total > number of executors and tasks to the total number of cores I have > available. I find it strange that I get a better throughput when I choose > an arbitrary large number for the parallelism hint than if I constrain > myself to a maximum that equates to the number of cores. > > D > > *From:* Nathan Leung <[email protected]> > *Sent:* Tuesday, 18 March 2014 18:38 > *To:* [email protected] > > In my experience storm is able to make good use of CPU resources, if the > application is written appropriately. You shouldn't require too much > executor parallelism if your application is CPU intensive. If your bolts > are doing things like remote DB/NoSQL accesses, then that changes things > and parallelizing bolts will give you more throughput. Not knowing your > application, the best way to pin down the problem is to simplify your > topology. Cut out everything except for the Spout. How is your filtering > done? if you return without emitting, the latest versions of storm will > sleep before trying again. It may be worthwhile to loop in the spout until > you receive a valid message, or the bus is empty. How much throughput can > you achieve from the spout, emitting a tuple into the ether? Maybe the > problem is your message bus. Once you have achieve a level of performance > you are satisfied from the spout, add one bolt. What bottlenecks does the > bolt introduce? etc etc. > > > On Tue, Mar 18, 2014 at 2:31 PM, David Crossland <[email protected]>wrote: > >> Could my issue relate to memory allocated to the JVM? Most of the >> setting are pretty much the defaults. Are there any other settings that >> could be throttling the topology? >> >> I'd like to be able to identify the issue without all this >> constant “stabbing in the dark”… 😃 >> >> D >> >> *From:* David Crossland <[email protected]> >> *Sent:* Tuesday, 18 March 2014 16:32 >> *To:* [email protected] >> >> Being very new to storm I'm not sure what to expect in some regards. >> >> Ive been playing about with the number of workers/executors/tasks >> trying to improve throughput on my cluster. I have a 3 nodes, two 4 core >> and a 2 core node (I can't increase the 3rd node to a medium until the >> customer gets more cores..). There is a spout that reads from a message >> bus and a bolt that filter out all but the messages we are interested in >> processing downstream in the topology. Most messages are filtered out. >> >> I'm assuming that these two components require the most resources as >> they should be reading/filtering messages at a constant rate, there are two >> further bolts that are invoked intermittently and hence require less. >> >> Ive set the number of workers to 12 (in fact I've noticed it rarely >> seems to make much difference if I set this to 3/6/9 or 12, there is >> marginal improvement the higher the value). >> >> The spout and filtering bolt I've tried a number of values for in the >> parallelism hint (I started with 1/4/8/16/20/30/40/60/128/256… ) and I can >> barely get the throughput to exceed 3500 messages per minute. And the >> larger hints just grind the system to a halt. Currently I have them both >> set to 20. >> >> The strange thing is that no matter what I do the CPU load is very tiny >> and is typically 80-90% idle. Suggesting that the topology isn't doing >> that much work. And tbh I've no idea why this is. Can anyone offer any >> suggestions? >> >> If I understand how this should work, given the number of cores I would >> think the total number of executors should total 10? Spawning 1 thread per >> node, I can then set a number of tasks to say 2/4/8 per thread (I've no >> idea which would be most efficient..). Ive tried something along these >> lines and my throughput was significantly less, approx. 2000 messages per >> minute. In fact parallelism hint of 20/24/30 seem to have produced the >> most throughput. >> >> All help/suggestions gratefully received! >> >> Thanks >> David >> >> >
