Below is my Topology configuration and Topology status bases on the configuration. Can anyone help me how to optimize the storm for faster process of the 20 Million data?
Topology statsWindowEmittedTransferredComplete latency (ms)AckedFailed10m 0s <http://nmcxstrmd001:8080/topology.html?id=CEXPStormTopology-1-1411442050&window=600> 11142011142047305.48828610003h 0m 0s <http://nmcxstrmd001:8080/topology.html?id=CEXPStormTopology-1-1411442050&window=10800> 11142011142047305.48828610001d 0h 0m 0s <http://nmcxstrmd001:8080/topology.html?id=CEXPStormTopology-1-1411442050&window=86400> 11142011142047305.4882861000All time <http://nmcxstrmd001:8080/topology.html?id=CEXPStormTopology-1-1411442050&window=:all-time> 11142011142047305.4882861000 Topology ConfigurationKeyValuedev.zookeeper.path/tmp/dev-storm-zookeeper drpc.childopts-Xmx768mdrpc.invocations.port3773drpc.port3772drpc.queue.size 128drpc.request.timeout.secs600drpc.worker.threads64java.library.path /usr/local/lib:/opt/local/lib:/usr/liblogviewer.appender.nameA1 logviewer.childopts-Xmx128mlogviewer.port8000nimbus.childopts-Xmx1024m nimbus.cleanup.inbox.freq.secs600nimbus.file.copy.expiration.secs600 nimbus.hostmystormservernimbus.inbox.jar.expiration.secs3600 nimbus.monitor.freq.secs10nimbus.reassigntruenimbus.supervisor.timeout.secs 60nimbus.task.launch.secs120nimbus.task.timeout.secs30 nimbus.thrift.max_buffer_size1048576nimbus.thrift.port6627 nimbus.topology.validatorbacktype.storm.nimbus.DefaultTopologyValidator storm.cluster.modedistributedstorm.config.properties[object Object]storm.id CEXPStormTopology-1-1411442050storm.local.dir/data/disk00/storm/localdir storm.local.mode.zmqfalsestorm.messaging.netty.buffer_size5242880 storm.messaging.netty.client_worker_threads1 storm.messaging.netty.flush.check.interval.ms10 storm.messaging.netty.max_retries30storm.messaging.netty.max_wait_ms1000 storm.messaging.netty.min_wait_ms100 storm.messaging.netty.server_worker_threads1 storm.messaging.netty.transfer.batch.size262144storm.messaging.transport backtype.storm.messaging.netty.Contextstorm.thrift.transport backtype.storm.security.auth.SimpleTransportPlugin storm.zookeeper.connection.timeout15000storm.zookeeper.port2181 storm.zookeeper.retry.interval1000 storm.zookeeper.retry.intervalceiling.millis30000storm.zookeeper.retry.times 5storm.zookeeper.root/stormstorm.zookeeper.serversmystormserver storm.zookeeper.session.timeout20000supervisor.childopts-Xmx256m supervisor.enabletruesupervisor.heartbeat.frequency.secs5 supervisor.monitor.frequency.secs3supervisor.slots.ports 6700,6701,6702,6703,6704,6705,6706,6707,6708,6709,6710,6711,6712,6713,6714,6715,6716,6717,6718,6719,6720,6721,6722,6723,6724,6725,6726,6727,6728 supervisor.worker.start.timeout.secs120supervisor.worker.timeout.secs30 task.heartbeat.frequency.secs3task.refresh.poll.secs10 topology.acker.executors1000topology.builtin.metrics.bucket.size.secs60 topology.debugtruetopology.disruptor.wait.strategy com.lmax.disruptor.BlockingWaitStrategytopology.enable.message.timeoutstrue topology.error.throttle.interval.secs10topology.executor.receive.buffer.size 65536topology.executor.send.buffer.size65536 topology.fall.back.on.java.serializationtruetopology.kryo.decorators topology.kryo.factorybacktype.storm.serialization.DefaultKryoFactory topology.kryo.register[object Object]topology.max.error.report.per.interval5 topology.max.spout.pending5000topology.max.task.parallelism100 topology.message.timeout.secs60topology.multilang.serializer backtype.storm.multilang.JsonSerializertopology.nameCEXPStormTopology topology.receiver.buffer.size8topology.skip.missing.kryo.registrationsfalse topology.sleep.spout.wait.strategy.time.ms1topology.spout.wait.strategy backtype.storm.spout.SleepSpoutWaitStrategy topology.state.synchronization.timeout.secs60topology.stats.sample.rate0.05 topology.taskstopology.tick.tuple.freq.secstopology.transfer.buffer.size32 topology.trident.batch.emit.interval.millis500topology.tuple.serializer backtype.storm.serialization.types.ListDelegateSerializer topology.worker.childoptstopology.worker.receiver.thread.count1 topology.worker.shared.thread.pool.size4topology.workers20 transactional.zookeeper.porttransactional.zookeeper.root/transactional transactional.zookeeper.serversui.childopts-Xmx768mui.port8080 worker.childopts-Xmx1024mworker.heartbeat.frequency.secs1zmq.hwm0 zmq.linger.millis5000zmq.threads1 -- Kushan Maskey 817.403.7500 On Mon, Sep 22, 2014 at 9:25 PM, Kushan Maskey < kushan.mas...@mmillerassociates.com> wrote: > Here is my storm config. > > > storm.config.setMaxTaskParallelism=4 > > storm.config.setNumWorkers=20 > > storm.config.setMaxSpoutPending=5000 > > storm.config.numAckers=1000 > > > I am guessing I need to increase the maxTaskParallelism more. IF that is > the case how much would you suggest? Any help will be highly appreciated. > > > Thanks. > > -- > Kushan Maskey > 817.403.7500 > > On Mon, Sep 22, 2014 at 9:20 PM, Michael Rose <mich...@fullcontact.com> > wrote: > >> Storm is not your bottleneck. Check your Storm code to 1) ensure you're >> parallelizing your writes and 2) you're batching writes to your external >> resources if possible. Some quick napkin math shows you only doing 110 >> writes/s, which seems awfully low. >> >> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >> mich...@fullcontact.com >> >> On Mon, Sep 22, 2014 at 8:05 PM, Kushan Maskey < >> kushan.mas...@mmillerassociates.com> wrote: >> >>> I am trying to load 20 M records into Cassandra database through >>> Kafka-Storm. I am able to post all the data in 5 mins into Kafka. But >>> reading it from storm and inserting into Cassandra, Couch and Solr is kind >>> of very slow. It has been running for past 5 hours and so far only 2 >>> Million records. >>> >>> How do I make the storm perform faster? Coz in this pace it will take >>> couple of days to load all the data. >>> >>> -- >>> Kushan Maskey >>> >>> >> >