Hi Cody and James,

Both Kafka brokers and Storm Supervisors run on a hypervisor on the same
machine. The topology runs with the number of workers set to 4, and the
Kafka spout fetch size is set to 25 MB. The paralellism of all components
is 4. maxSpoutPending is set to 8. The transport  mechanism is Netty, and
the configuration is pretty standard. All servers run Linux.

Mind you again, this was a simplified, "consume-only" Trident
transactional topology, with a spout and a debugging, throughput-logging
Trident filter only. Adding a groupBy and persistentAggregate (with a
MemoryMapState) dropped this to about 160 000 messages/s.  Messages vary in
size, about 1 to 1,5 KB. All this with Kryo serialization disabled.

Best regards,

Danijel

On Tuesday, June 24, 2014, Cody A. Ray <[email protected]> wrote:

> Hi Danijel -
>
> What sort of hardware are your Kafka brokers and Storm workers running on
> for 400k msgs/s from Kafka example? (We're also running into a throughput
> problem but we haven't run a simplified topology such as the one you
> mention to benchmark yet. I'll email out our specs and stuff in a post to
> the list soon.)
>
> -Cody
>
>
> On Tue, Jun 24, 2014 at 11:13 AM, <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Hi,
>>
>> Perhaps MySQL is the bottleneck, I'll try it. However, if some bolt is
>> very busy, will storm be slower to emit tuples? My message type is an avro
>> from kafka, and each avro message is about 3KB. What types of message do
>> you fetch from kafka?
>>
>> Another import question is what kafka-storm do you use? I see so many
>> different versions of them and make me confused. Can you share storm config
>> in your topology and kafkaSpout's config to me?
>>
>> Thank you very much!
>>
>>
>> Best regards,
>> James Fu
>>
>>
>>
>> Danijel Schiavuzzi <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> 於 2014/6/25
>> 上午12:02 寫道:
>>
>> Try to run the topology without the MySQL bolt to find out if that's the
>> bottleneck. Do you update the database in batches?  That's an essential
>> optimization you should implement.
>>
>> With a two node Storm cluster I can fetch 450 000 messages/s from Kafka,
>> and that's with a Trident transactional topology (just the spout and a
>> debug filter bolt). Kafka has two nodes with 4 partitions only. Basic Storm
>> should be faster.
>> On Jun 24, 2014 4:12 PM, <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>> >
>> > Hi all,
>> >
>> > I face a critical problem about performance of my storm topology. I can
>> only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
>> to set my topology(not trident), and my topology information is as follows:
>> > [Machines]
>> > I have 1 nimbus and 3 supervisors and each with 2-core CPU in
>> GCE(google compute engine)
>> > Number of workers:12
>> > Number of executers:51
>> > [Topology]
>> > Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
>> > Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>> >
>> > KafkaSpout(topic) emits to boltA and boltB
>> > boltA(parallelism=9): parse the avro tuple from kafkaSpout
>> > boltB(parallelism=1): Counting number of bolt only
>> >
>> > Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
>> mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
>> than 10ms). In addition, my complete latency of these kafkaspouts is more
>> than 2000ms in the beggining, but it drops to 1000ms after a while.
>> >
>> > I found this topology can only process 1000 tuples/s or less, but my
>> goal is to process 10000 tuples/s. Is any wrong of my topology config?
>> Actually, my topology is doing simple thing like counting and dumping to
>> mysql only. It seems storm not to have a good performance as it
>> says(million of tuples in a second in 10-node cluster). Can anyone give me
>> some suggestion?
>> >
>> > Thanks a lot.
>> >
>> > Best regards,
>> > James
>>
>>
>
>
> --
> Cody A. Ray, LEED AP
> [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>
> 215.501.7891
>


-- 
Danijel Schiavuzzi

E: [email protected]
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Reply via email to