Do you filter out message in the nextTuple method and with no tuple sent at one time it was called? There's a internal mechanism that storm will sleep 1ms to call nextTuple if this calling of nextTuple sent nothing.
2014年3月19日 上午7:14于 David Crossland <[email protected]>写道: > > Perhaps these screenshots might shed some light? I don't think there is much > of a latency issue. I'm really starting to suspect there is some consumption > rate issue from the topic. > > I set the spout to a high parallelism value as it did seem to improve > throughput.. > > But if there is anything you can spot that would be grand > > Thanks > David > > From: Nathan Leung > Sent: Tuesday, 18 March 2014 21:14 > To: [email protected] > > It could be bolt 3. What is the latency like between your worker and your > redis server? Increasing the number of threads for bolt 3 will likely > increase your throughput. Bolt 1 and 2 are probably CPU bound, but bolt 3 is > probably restricted by your network access. Also I've found that > localOrShuffleGrouping can improve performance due to reduced network > communications. > > > On Tue, Mar 18, 2014 at 3:55 PM, David Crossland <[email protected]> > wrote: >> >> A bit more information then >> >> There are 4 components >> >> Spout - This is reading from an azure service bus topic/subscription. A >> connection is created in the open() method of the spout, nextTuple does a >> peek on the message, and invokes the following code; >> >> StringWriter writer = new StringWriter(); >> IOUtils.copy(message.getBody(), writer); >> String messageBody = writer.toString(); >> >> It then deletes the message from the queue. >> >> Overall nothing all that exciting.. >> >> Bolt 1 - Filtering >> >> Parses the message body (json string) and converts it to an object >> representation. Filters out anything that isn't a monetise message. It >> then emits the monetise message object to the next bolt. Monetise messages >> account for ~ 0.03% of the total message volume. >> >> Bolt 2 - transformation >> >> Basically extracts from the monetise object the values that are interesting >> and contracts a string which it emits >> >> Bolt 3 - Storage >> >> Stores the transformed string in Redis using the current date/time as key. >> >> ----- >> >> Shuffle grouping is used with the topology >> >> I ack every tuple irrespective of whether I emit the tuple or not. It >> should not be attempting to replay tuple. >> >> ----- >> >> I don't think Bolt 2/3 are the cause of the bottleneck. They don't have to >> process much data at all tbh. >> >> I can accept that perhaps there is something inefficient with the spout, >> perhaps it just can't read from the service bus quickly enough. I will do >> some more research on this and have a chat with the colleague who wrote this >> component. >> >> I suppose I'm just trying to identify if I've configured something >> incorrectly with respect to storm, whether I'm correct to relate the total >> number of executors and tasks to the total number of cores I have available. >> I find it strange that I get a better throughput when I choose an arbitrary >> large number for the parallelism hint than if I constrain myself to a >> maximum that equates to the number of cores. >> >> D >> >> From: Nathan Leung >> Sent: Tuesday, 18 March 2014 18:38 >> To: [email protected] >> >> In my experience storm is able to make good use of CPU resources, if the >> application is written appropriately. You shouldn't require too much >> executor parallelism if your application is CPU intensive. If your bolts >> are doing things like remote DB/NoSQL accesses, then that changes things and >> parallelizing bolts will give you more throughput. Not knowing your >> application, the best way to pin down the problem is to simplify your >> topology. Cut out everything except for the Spout. How is your filtering >> done? if you return without emitting, the latest versions of storm will >> sleep before trying again. It may be worthwhile to loop in the spout until >> you receive a valid message, or the bus is empty. How much throughput can >> you achieve from the spout, emitting a tuple into the ether? Maybe the >> problem is your message bus. Once you have achieve a level of performance >> you are satisfied from the spout, add one bolt. What bottlenecks does the >> bolt introduce? etc etc. >> >> >> On Tue, Mar 18, 2014 at 2:31 PM, David Crossland <[email protected]> >> wrote: >>> >>> Could my issue relate to memory allocated to the JVM? Most of the setting >>> are pretty much the defaults. Are there any other settings that could be >>> throttling the topology? >>> >>> I'd like to be able to identify the issue without all this constant >>> “stabbing in the dark”… 😃 >>> >>> D >>> >>> From: David Crossland >>> Sent: Tuesday, 18 March 2014 16:32 >>> To: [email protected] >>> >>> Being very new to storm I'm not sure what to expect in some regards. >>> >>> Ive been playing about with the number of workers/executors/tasks trying to >>> improve throughput on my cluster. I have a 3 nodes, two 4 core and a 2 >>> core node (I can't increase the 3rd node to a medium until the customer >>> gets more cores..). There is a spout that reads from a message bus and a >>> bolt that filter out all but the messages we are interested in processing >>> downstream in the topology. Most messages are filtered out. >>> >>> I'm assuming that these two components require the most resources as they >>> should be reading/filtering messages at a constant rate, there are two >>> further bolts that are invoked intermittently and hence require less. >>> >>> Ive set the number of workers to 12 (in fact I've noticed it rarely seems >>> to make much difference if I set this to 3/6/9 or 12, there is marginal >>> improvement the higher the value). >>> >>> The spout and filtering bolt I've tried a number of values for in the >>> parallelism hint (I started with 1/4/8/16/20/30/40/60/128/256… ) and I can >>> barely get the throughput to exceed 3500 messages per minute. And the >>> larger hints just grind the system to a halt. Currently I have them both >>> set to 20. >>> >>> The strange thing is that no matter what I do the CPU load is very tiny and >>> is typically 80-90% idle. Suggesting that the topology isn't doing that >>> much work. And tbh I've no idea why this is. Can anyone offer any >>> suggestions? >>> >>> If I understand how this should work, given the number of cores I would >>> think the total number of executors should total 10? Spawning 1 thread per >>> node, I can then set a number of tasks to say 2/4/8 per thread (I've no >>> idea which would be most efficient..). Ive tried something along these >>> lines and my throughput was significantly less, approx. 2000 messages per >>> minute. In fact parallelism hint of 20/24/30 seem to have produced the >>> most throughput. >>> >>> All help/suggestions gratefully received! >>> >>> Thanks >>> David >>> >> >
