On serialization, make sure your custom classes are registered with Kryo
otherwise it may use Java serialization (slow)
On Jun 25, 2014 10:30 AM, "Robert Turner" <[email protected]> wrote:

> Serialisation across workers might be your problem, if you can use the
> "localOrShuffle" grouping and arrange that the number of spouts and bolts
> is a multiple of the number of workers then this will minimise the
> serialisation across workers. If there is only one counting bolt for the
> topology then tuples are serialised and sent to the worker with the single
> counting bolt. A better approach might be to have a single counting bolt
> per worker and aggregate those periodically.
>
> Regards
>    Rob Turner.
>
>
> On 24 June 2014 15:10, <[email protected]> wrote:
>
>> Hi all,
>>
>> I face a critical problem about performance of my storm topology. I can
>> only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
>> to set my topology(not trident), and my topology information is as follows:
>> [Machines]
>> I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google
>> compute engine)
>> Number of workers:12
>> Number of executers:51
>> [Topology]
>> Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
>> Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>>
>> KafkaSpout(topic) emits to boltA and boltB
>> boltA(parallelism=9): parse the avro tuple from kafkaSpout
>> boltB(parallelism=1): Counting number of bolt only
>>
>> Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
>> mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
>> than 10ms). In addition, my complete latency of these kafkaspouts is more
>> than 2000ms in the beggining, but it drops to 1000ms after a while.
>>
>> I found this topology can only process 1000 tuples/s or less, but my goal
>> is to process 10000 tuples/s. Is any wrong of my topology config? Actually,
>> my topology is doing simple thing like counting and dumping to mysql only.
>> It seems storm not to have a good performance as it says(million of tuples
>> in a second in 10-node cluster). Can anyone give me some suggestion?
>>
>> Thanks a lot.
>>
>> Best regards,
>> James
>
>
>
>
> --
> Cheers
>    Rob.
>

Reply via email to