Hi all,
My environment and conf. are as follows:
[Machines] 1 nimbus and 3 supervisors on AWS with m1.medium
[Topology] 4 Spouts(each for a topic of kafka with parallelism hint 2) and 10
bolts
[Topology] 6 workers, 34 executors, 34 tasks
My first bolt(parallelism hint=5) is parsing data from soput, and its capacity
is over 1.0 often. My consideration is as follows:
1. Using tick-tuple feature to write my result into mysql database:
if (TupleHelpers.isTickTuple(tuple)) {
//emit the result to next bolt
collector.emit(new Values(result));
}else{
//store result in memory
collector.ack(tuple);
}
I set TOPOLOGY_TICK_TUPLE_FREQ_SECS for 30 seconds. Is it correct to emit in
unanchor way, so that the tuple will not be tracked? I'm afraid something wrong
here.
2. Bad way in 1 topic with 1 KafkaSpout?
Actually I will use 12 topics so taht I have 12 spouts in my topology. Is it
good for 1 tpic for 1 spout?
3. Slow speed for my topology.
One of my bolt is connectd from spout and counting the number of tuples
received. I found it can process 300~400 tuples/sec only...Whats wrong with my
topology?
[storm UI]
In the beginning of start, the complete latency is over 30000 ms, and lots of
fail tuples in "spouts" but no fail tuple in "bolts". Can anyone give me some
advice and speed up my topology?
Best regards,
James