And what about cpu/network/disk utilization ? And load factors per bolt from storm UI ?
On 14 June 2014 15:53, Shaikh Riyaz <[email protected]> wrote: > Hi, > > Daily we are downloaded 28 Million of messages and Monthly it goes up to > 800+ million. > > We want to process this amount of data through our kafka and storm cluster > and would like to store in HBase cluster. > > We are targeting to process one month of data in one day. Is it possible? > > We have setup our cluster thinking that we can process million of messages > in one sec as mentioned on web. Unfortunately, we have ended-up with > processing only 1200-1700 message per second. if we continue with this > speed than it will take min 10 days to process 30 days of data, which is > the relevant solution in our case. > > I suspect that we have to change some configuration to achieve this goal. > Looking for help from experts to support me in achieving this task. > > *Kafka Cluster:* > Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of > storage. We have total 11 nodes kafka cluster spread across these two > servers. > > *Kafka Configuration:* > producer.type=async > compression.codec=none > request.required.acks=-1 > serializer.class=kafka.serializer.StringEncoder > queue.buffering.max.ms=100000 > batch.num.messages=10000 > queue.buffering.max.messages=100000 > default.replication.factor=3 > controlled.shutdown.enable=true > auto.leader.rebalance.enable=true > num.network.threads=2 > num.io.threads=8 > num.partitions=4 > log.retention.hours=12 > log.segment.bytes=536870912 > log.retention.check.interval.ms=60000 > log.cleaner.enable=false > > *Storm Cluster:* > Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB > of RAM and 8TB of storage. These servers are shared with hbase cluster. > > *Kafka spout configuration* > kafkaConfig.bufferSizeBytes = 1024*1024*8; > kafkaConfig.fetchSizeBytes = 1024*1024*4; > kafkaConfig.forceFromStart = true; > > *Topology: StormTopology* > Spout - Partition: 4 > First Bolt - parallelism hint: 6 and Num tasks: 5 > Second Bolt - parallelism hint: 5 > Third Bolt - parallelism hint: 3 > Fourth Bolt - parallelism hint: 3 and Num tasks: 4 > Fifth Bolt - parallelism hint: 3 > Sixth Bolt - parallelism hint: 3 > > *Supervisor configuration:* > > storm.local.dir: "/app/storm" > storm.zookeeper.port: 2181 > storm.cluster.mode: "distributed" > storm.local.mode.zmq: false > supervisor.slots.ports: > - 6700 > - 6701 > - 6702 > - 6703 > supervisor.worker.start.timeout.secs: 180 > supervisor.worker.timeout.secs: 30 > supervisor.monitor.frequency.secs: 3 > supervisor.heartbeat.frequency.secs: 5 > supervisor.enable: true > > storm.messaging.netty.server_worker_threads: 2 > storm.messaging.netty.client_worker_threads: 2 > storm.messaging.netty.buffer_size: 52428800 #50MB buffer > storm.messaging.netty.max_retries: 25 > storm.messaging.netty.max_wait_ms: 1000 > storm.messaging.netty.min_wait_ms: 100 > > > supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true" > worker.childopts: "-Xmx2048m -Djava.net.preferIPv4Stack=true" > > > Please let me know if more information needed.. > > Thanks in advance. > > -- > Regards, > > Riyaz > >
