FWIW, setting the number of ackers to number of workers gave us a an order of magnitude gains in latency on our small ec2 test cluster. Our next step is to do simple microbatching/trident and see how that impacts latency
On Sat, Jun 14, 2014 at 8:28 AM, Haralds Ulmanis <[email protected]> wrote: > And what about cpu/network/disk utilization ? And load factors per bolt > from storm UI ? > > > On 14 June 2014 15:53, Shaikh Riyaz <[email protected]> wrote: > >> Hi, >> >> Daily we are downloaded 28 Million of messages and Monthly it goes up to >> 800+ million. >> >> We want to process this amount of data through our kafka and storm >> cluster and would like to store in HBase cluster. >> >> We are targeting to process one month of data in one day. Is it possible? >> >> We have setup our cluster thinking that we can process million of >> messages in one sec as mentioned on web. Unfortunately, we have ended-up >> with processing only 1200-1700 message per second. if we continue with >> this speed than it will take min 10 days to process 30 days of data, which >> is the relevant solution in our case. >> >> I suspect that we have to change some configuration to achieve this goal. >> Looking for help from experts to support me in achieving this task. >> >> *Kafka Cluster:* >> Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of >> storage. We have total 11 nodes kafka cluster spread across these two >> servers. >> >> *Kafka Configuration:* >> producer.type=async >> compression.codec=none >> request.required.acks=-1 >> serializer.class=kafka.serializer.StringEncoder >> queue.buffering.max.ms=100000 >> batch.num.messages=10000 >> queue.buffering.max.messages=100000 >> default.replication.factor=3 >> controlled.shutdown.enable=true >> auto.leader.rebalance.enable=true >> num.network.threads=2 >> num.io.threads=8 >> num.partitions=4 >> log.retention.hours=12 >> log.segment.bytes=536870912 >> log.retention.check.interval.ms=60000 >> log.cleaner.enable=false >> >> *Storm Cluster:* >> Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB >> of RAM and 8TB of storage. These servers are shared with hbase cluster. >> >> *Kafka spout configuration* >> kafkaConfig.bufferSizeBytes = 1024*1024*8; >> kafkaConfig.fetchSizeBytes = 1024*1024*4; >> kafkaConfig.forceFromStart = true; >> >> *Topology: StormTopology* >> Spout - Partition: 4 >> First Bolt - parallelism hint: 6 and Num tasks: 5 >> Second Bolt - parallelism hint: 5 >> Third Bolt - parallelism hint: 3 >> Fourth Bolt - parallelism hint: 3 and Num tasks: 4 >> Fifth Bolt - parallelism hint: 3 >> Sixth Bolt - parallelism hint: 3 >> >> *Supervisor configuration:* >> >> storm.local.dir: "/app/storm" >> storm.zookeeper.port: 2181 >> storm.cluster.mode: "distributed" >> storm.local.mode.zmq: false >> supervisor.slots.ports: >> - 6700 >> - 6701 >> - 6702 >> - 6703 >> supervisor.worker.start.timeout.secs: 180 >> supervisor.worker.timeout.secs: 30 >> supervisor.monitor.frequency.secs: 3 >> supervisor.heartbeat.frequency.secs: 5 >> supervisor.enable: true >> >> storm.messaging.netty.server_worker_threads: 2 >> storm.messaging.netty.client_worker_threads: 2 >> storm.messaging.netty.buffer_size: 52428800 #50MB buffer >> storm.messaging.netty.max_retries: 25 >> storm.messaging.netty.max_wait_ms: 1000 >> storm.messaging.netty.min_wait_ms: 100 >> >> >> supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true" >> worker.childopts: "-Xmx2048m -Djava.net.preferIPv4Stack=true" >> >> >> Please let me know if more information needed.. >> >> Thanks in advance. >> >> -- >> Regards, >> >> Riyaz >> >> >
