On serialization, make sure your custom classes are registered with Kryo otherwise it may use Java serialization (slow) On Jun 25, 2014 10:30 AM, "Robert Turner" <[email protected]> wrote:
> Serialisation across workers might be your problem, if you can use the > "localOrShuffle" grouping and arrange that the number of spouts and bolts > is a multiple of the number of workers then this will minimise the > serialisation across workers. If there is only one counting bolt for the > topology then tuples are serialised and sent to the worker with the single > counting bolt. A better approach might be to have a single counting bolt > per worker and aggregate those periodically. > > Regards > Rob Turner. > > > On 24 June 2014 15:10, <[email protected]> wrote: > >> Hi all, >> >> I face a critical problem about performance of my storm topology. I can >> only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm >> to set my topology(not trident), and my topology information is as follows: >> [Machines] >> I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google >> compute engine) >> Number of workers:12 >> Number of executers:51 >> [Topology] >> Number of kafkaSpout: 13(fetch 13 topics from kafka brokers) >> Number of Bolts: 12(There are 5 mysql-dumper bolt here) >> >> KafkaSpout(topic) emits to boltA and boltB >> boltA(parallelism=9): parse the avro tuple from kafkaSpout >> boltB(parallelism=1): Counting number of bolt only >> >> Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5 >> mysql-dumper bolt's execute latency is more than 300ms(other bolts are less >> than 10ms). In addition, my complete latency of these kafkaspouts is more >> than 2000ms in the beggining, but it drops to 1000ms after a while. >> >> I found this topology can only process 1000 tuples/s or less, but my goal >> is to process 10000 tuples/s. Is any wrong of my topology config? Actually, >> my topology is doing simple thing like counting and dumping to mysql only. >> It seems storm not to have a good performance as it says(million of tuples >> in a second in 10-node cluster). Can anyone give me some suggestion? >> >> Thanks a lot. >> >> Best regards, >> James > > > > > -- > Cheers > Rob. >
