Hi,

Perhaps MySQL is the bottleneck, I'll try it. However, if some bolt is very 
busy, will storm be slower to emit tuples? My message type is an avro from 
kafka, and each avro message is about 3KB. What types of message do you fetch 
from kafka?

Another import question is what kafka-storm do you use? I see so many different 
versions of them and make me confused. Can you share storm config in your 
topology and kafkaSpout's config to me?

Thank you very much!


Best regards,
James Fu



> Danijel Schiavuzzi <[email protected]> 於 2014/6/25 上午12:02 寫道:
> 
> Try to run the topology without the MySQL bolt to find out if that's the 
> bottleneck. Do you update the database in batches?  That's an essential 
> optimization you should implement.
> 
> With a two node Storm cluster I can fetch 450 000 messages/s from Kafka, and 
> that's with a Trident transactional topology (just the spout and a debug 
> filter bolt). Kafka has two nodes with 4 partitions only. Basic Storm should 
> be faster.
> On Jun 24, 2014 4:12 PM, <[email protected]> wrote:
> >
> > Hi all,
> >
> > I face a critical problem about performance of my storm topology. I can 
> > only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm 
> > to set my topology(not trident), and my topology information is as follows:
> > [Machines]
> > I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google 
> > compute engine)
> > Number of workers:12
> > Number of executers:51
> > [Topology]
> > Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
> > Number of Bolts: 12(There are 5 mysql-dumper bolt here)
> >
> > KafkaSpout(topic) emits to boltA and boltB
> > boltA(parallelism=9): parse the avro tuple from kafkaSpout
> > boltB(parallelism=1): Counting number of bolt only
> >
> > Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5 
> > mysql-dumper bolt's execute latency is more than 300ms(other bolts are less 
> > than 10ms). In addition, my complete latency of these kafkaspouts is more 
> > than 2000ms in the beggining, but it drops to 1000ms after a while.
> >
> > I found this topology can only process 1000 tuples/s or less, but my goal 
> > is to process 10000 tuples/s. Is any wrong of my topology config? Actually, 
> > my topology is doing simple thing like counting and dumping to mysql only. 
> > It seems storm not to have a good performance as it says(million of tuples 
> > in a second in 10-node cluster). Can anyone give me some suggestion?
> >
> > Thanks a lot.
> >
> > Best regards,
> > James

Reply via email to