And what about cpu/network/disk utilization ? And load factors per bolt
from storm UI ?


On 14 June 2014 15:53, Shaikh Riyaz <[email protected]> wrote:

> Hi,
>
> Daily we are downloaded 28 Million of messages and Monthly it goes up to
> 800+ million.
>
> We want to process this amount of data through our kafka and storm cluster
> and would like to store in HBase cluster.
>
> We are targeting to process one month of data in one day. Is it possible?
>
> We have setup our cluster thinking that we can process million of messages
> in one sec as mentioned on web. Unfortunately, we have ended-up with
> processing only 1200-1700 message per second.  if we continue with this
> speed than it will take min 10 days to process 30 days of data, which is
> the relevant solution in our case.
>
> I suspect that we have to change some configuration to achieve this goal.
> Looking for help from experts to support me in achieving this task.
>
> *Kafka Cluster:*
> Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of
> storage. We have total 11 nodes kafka cluster spread across these two
> servers.
>
> *Kafka Configuration:*
> producer.type=async
> compression.codec=none
> request.required.acks=-1
> serializer.class=kafka.serializer.StringEncoder
> queue.buffering.max.ms=100000
> batch.num.messages=10000
> queue.buffering.max.messages=100000
> default.replication.factor=3
> controlled.shutdown.enable=true
> auto.leader.rebalance.enable=true
> num.network.threads=2
> num.io.threads=8
> num.partitions=4
> log.retention.hours=12
> log.segment.bytes=536870912
> log.retention.check.interval.ms=60000
> log.cleaner.enable=false
>
> *Storm Cluster:*
> Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB
> of RAM and 8TB of storage. These servers are shared with hbase cluster.
>
> *Kafka spout configuration*
> kafkaConfig.bufferSizeBytes = 1024*1024*8;
> kafkaConfig.fetchSizeBytes = 1024*1024*4;
> kafkaConfig.forceFromStart = true;
>
> *Topology: StormTopology*
> Spout           - Partition: 4
> First Bolt -  parallelism hint: 6 and Num tasks: 5
> Second Bolt -  parallelism hint: 5
> Third Bolt -   parallelism hint: 3
> Fourth Bolt   -  parallelism hint: 3 and Num tasks: 4
> Fifth Bolt     -  parallelism hint: 3
> Sixth Bolt -  parallelism hint: 3
>
> *Supervisor configuration:*
>
> storm.local.dir: "/app/storm"
> storm.zookeeper.port: 2181
> storm.cluster.mode: "distributed"
> storm.local.mode.zmq: false
> supervisor.slots.ports:
>     - 6700
>     - 6701
>     - 6702
>     - 6703
> supervisor.worker.start.timeout.secs: 180
> supervisor.worker.timeout.secs: 30
> supervisor.monitor.frequency.secs: 3
> supervisor.heartbeat.frequency.secs: 5
> supervisor.enable: true
>
> storm.messaging.netty.server_worker_threads: 2
> storm.messaging.netty.client_worker_threads: 2
> storm.messaging.netty.buffer_size: 52428800 #50MB buffer
> storm.messaging.netty.max_retries: 25
> storm.messaging.netty.max_wait_ms: 1000
> storm.messaging.netty.min_wait_ms: 100
>
>
> supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
> worker.childopts: "-Xmx2048m -Djava.net.preferIPv4Stack=true"
>
>
> Please let me know if more information needed..
>
> Thanks in advance.
>
> --
> Regards,
>
> Riyaz
>
>

Reply via email to