Re: Help is processing huge data through Kafka-storm cluster

Neelesh Fri, 20 Jun 2014 23:55:23 -0700

FWIW, setting the number of ackers to number of workers gave us a an order
of magnitude gains in latency on our small ec2 test cluster. Our next step
is to do simple microbatching/trident and see how that impacts latency



On Sat, Jun 14, 2014 at 8:28 AM, Haralds Ulmanis <[email protected]>
wrote:

> And what about cpu/network/disk utilization ? And load factors per bolt
> from storm UI ?
>
>
> On 14 June 2014 15:53, Shaikh Riyaz <[email protected]> wrote:
>
>> Hi,
>>
>> Daily we are downloaded 28 Million of messages and Monthly it goes up to
>> 800+ million.
>>
>> We want to process this amount of data through our kafka and storm
>> cluster and would like to store in HBase cluster.
>>
>> We are targeting to process one month of data in one day. Is it possible?
>>
>> We have setup our cluster thinking that we can process million of
>> messages in one sec as mentioned on web. Unfortunately, we have ended-up
>> with processing only 1200-1700 message per second.  if we continue with
>> this speed than it will take min 10 days to process 30 days of data, which
>> is the relevant solution in our case.
>>
>> I suspect that we have to change some configuration to achieve this goal.
>> Looking for help from experts to support me in achieving this task.
>>
>> *Kafka Cluster:*
>> Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of
>> storage. We have total 11 nodes kafka cluster spread across these two
>> servers.
>>
>> *Kafka Configuration:*
>> producer.type=async
>> compression.codec=none
>> request.required.acks=-1
>> serializer.class=kafka.serializer.StringEncoder
>> queue.buffering.max.ms=100000
>> batch.num.messages=10000
>> queue.buffering.max.messages=100000
>> default.replication.factor=3
>> controlled.shutdown.enable=true
>> auto.leader.rebalance.enable=true
>> num.network.threads=2
>> num.io.threads=8
>> num.partitions=4
>> log.retention.hours=12
>> log.segment.bytes=536870912
>> log.retention.check.interval.ms=60000
>> log.cleaner.enable=false
>>
>> *Storm Cluster:*
>> Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB
>> of RAM and 8TB of storage. These servers are shared with hbase cluster.
>>
>> *Kafka spout configuration*
>> kafkaConfig.bufferSizeBytes = 1024*1024*8;
>> kafkaConfig.fetchSizeBytes = 1024*1024*4;
>> kafkaConfig.forceFromStart = true;
>>
>> *Topology: StormTopology*
>> Spout           - Partition: 4
>> First Bolt -  parallelism hint: 6 and Num tasks: 5
>> Second Bolt -  parallelism hint: 5
>> Third Bolt -   parallelism hint: 3
>> Fourth Bolt   -  parallelism hint: 3 and Num tasks: 4
>> Fifth Bolt     -  parallelism hint: 3
>> Sixth Bolt -  parallelism hint: 3
>>
>> *Supervisor configuration:*
>>
>> storm.local.dir: "/app/storm"
>> storm.zookeeper.port: 2181
>> storm.cluster.mode: "distributed"
>> storm.local.mode.zmq: false
>> supervisor.slots.ports:
>>     - 6700
>>     - 6701
>>     - 6702
>>     - 6703
>> supervisor.worker.start.timeout.secs: 180
>> supervisor.worker.timeout.secs: 30
>> supervisor.monitor.frequency.secs: 3
>> supervisor.heartbeat.frequency.secs: 5
>> supervisor.enable: true
>>
>> storm.messaging.netty.server_worker_threads: 2
>> storm.messaging.netty.client_worker_threads: 2
>> storm.messaging.netty.buffer_size: 52428800 #50MB buffer
>> storm.messaging.netty.max_retries: 25
>> storm.messaging.netty.max_wait_ms: 1000
>> storm.messaging.netty.min_wait_ms: 100
>>
>>
>> supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
>> worker.childopts: "-Xmx2048m -Djava.net.preferIPv4Stack=true"
>>
>>
>> Please let me know if more information needed..
>>
>> Thanks in advance.
>>
>> --
>> Regards,
>>
>> Riyaz
>>
>>
>

Re: Help is processing huge data through Kafka-storm cluster

Reply via email to