Re: TridentKafkaSpout consume data too slow

saiprasad mishra Tue, 28 Oct 2014 15:50:19 -0700

you can increase the parallelismHint while creating the stream

You can also take a look at this for some more explanation


https://gist.github.com/mrflip/5958028


Regards
Sai

On Tue, Oct 28, 2014 at 3:35 PM, Sa Li <[email protected]> wrote:

> Thanks Sai, I set spoutConf.fetchSizeBytes = 500*1024*1024; it didn't
> help, I assume there might be some other issues unresolved.
>
> THanks
>
>
> Alec
>
> On Tue, Oct 28, 2014 at 3:18 PM, saiprasad mishra <
> [email protected]> wrote:
>
>> You can configure how much to fetch from kafka incase you have not done
>> the below like config
>>
>> spoutConf.fetchSizeBytes = 10*1024*1024; // these many bytes default is
>> 1024*1024 i guess
>>
>>
>> Regards
>>
>> Sai
>>
>> On Tue, Oct 28, 2014 at 2:59 PM, Sa Li <[email protected]> wrote:
>>
>>> Hi, all
>>>
>>> I have a question about TridentKafkaSpout, I am using TridentKafkaSpout
>>> to consume data from
>>> Kafka cluster, I found the latency of  kafkaSpout is too slow to
>>> acceptable, see this is the code
>>>
>>> BrokerHosts zk = new ZkHosts("10.100.70.128:2181");
>>> TridentKafkaConfig spoutConf = new TridentKafkaConfig(zk, "topictest");
>>> spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
>>> OpaqueTridentKafkaSpout kafkaSpout = new
>>> OpaqueTridentKafkaSpout(spoutConf);
>>> TridentTopology topology = new TridentTopology();
>>>
>>> I use two ways to create stream, one is KafkaSpout to read data from
>>> kafka, one is RandomTupleSpout to generate tuple randomly.
>>>
>>> topology.newStream("topictestspout", kafkaSpout)
>>> //topology.newStream("test", new RandomTupleSpout())
>>>
>>> .persistentAggregate(PostgresqlState.newFactory(config), new
>>> Fields("userid","event"), new EventUpdater(), new Fields( "eventword"))
>>>
>>>
>>> I am expecting to process the batch data in State, but I found list size
>>> of messages is up to 4 by using kafkaSpout no matter how large the batch
>>> size I set, but with RandomTupleSpout(), the actual batch size in memory
>>> can reach few hundreds (not very high), so I am concluding the KafkaSpout
>>> consume data too slow, it that because something wrong with my code, or
>>> something I need to change in Kafka/storm cluster configurations. Currently
>>> I am using 3-nodes (9 brokers) kafka cluster, single node storm cluster,
>>> 3-node zk ensemble. Above code is in the local mode.
>>>
>>> thanks
>>>
>>> Alec
>>>
>>
>>
>

Re: TridentKafkaSpout consume data too slow

Reply via email to