Below is my Topology configuration and Topology status bases on the
configuration. Can anyone help me how to optimize the storm for faster
process of the 20 Million data?

Topology statsWindowEmittedTransferredComplete latency (ms)AckedFailed10m 0s
<http://nmcxstrmd001:8080/topology.html?id=CEXPStormTopology-1-1411442050&window=600>
11142011142047305.48828610003h 0m 0s
<http://nmcxstrmd001:8080/topology.html?id=CEXPStormTopology-1-1411442050&window=10800>
11142011142047305.48828610001d 0h 0m 0s
<http://nmcxstrmd001:8080/topology.html?id=CEXPStormTopology-1-1411442050&window=86400>
11142011142047305.4882861000All time
<http://nmcxstrmd001:8080/topology.html?id=CEXPStormTopology-1-1411442050&window=:all-time>
11142011142047305.4882861000

Topology ConfigurationKeyValuedev.zookeeper.path/tmp/dev-storm-zookeeper
drpc.childopts-Xmx768mdrpc.invocations.port3773drpc.port3772drpc.queue.size
128drpc.request.timeout.secs600drpc.worker.threads64java.library.path
/usr/local/lib:/opt/local/lib:/usr/liblogviewer.appender.nameA1
logviewer.childopts-Xmx128mlogviewer.port8000nimbus.childopts-Xmx1024m
nimbus.cleanup.inbox.freq.secs600nimbus.file.copy.expiration.secs600
nimbus.hostmystormservernimbus.inbox.jar.expiration.secs3600
nimbus.monitor.freq.secs10nimbus.reassigntruenimbus.supervisor.timeout.secs
60nimbus.task.launch.secs120nimbus.task.timeout.secs30
nimbus.thrift.max_buffer_size1048576nimbus.thrift.port6627
nimbus.topology.validatorbacktype.storm.nimbus.DefaultTopologyValidator
storm.cluster.modedistributedstorm.config.properties[object Object]storm.id
CEXPStormTopology-1-1411442050storm.local.dir/data/disk00/storm/localdir
storm.local.mode.zmqfalsestorm.messaging.netty.buffer_size5242880
storm.messaging.netty.client_worker_threads1
storm.messaging.netty.flush.check.interval.ms10
storm.messaging.netty.max_retries30storm.messaging.netty.max_wait_ms1000
storm.messaging.netty.min_wait_ms100
storm.messaging.netty.server_worker_threads1
storm.messaging.netty.transfer.batch.size262144storm.messaging.transport
backtype.storm.messaging.netty.Contextstorm.thrift.transport
backtype.storm.security.auth.SimpleTransportPlugin
storm.zookeeper.connection.timeout15000storm.zookeeper.port2181
storm.zookeeper.retry.interval1000
storm.zookeeper.retry.intervalceiling.millis30000storm.zookeeper.retry.times
5storm.zookeeper.root/stormstorm.zookeeper.serversmystormserver
storm.zookeeper.session.timeout20000supervisor.childopts-Xmx256m
supervisor.enabletruesupervisor.heartbeat.frequency.secs5
supervisor.monitor.frequency.secs3supervisor.slots.ports
6700,6701,6702,6703,6704,6705,6706,6707,6708,6709,6710,6711,6712,6713,6714,6715,6716,6717,6718,6719,6720,6721,6722,6723,6724,6725,6726,6727,6728
supervisor.worker.start.timeout.secs120supervisor.worker.timeout.secs30
task.heartbeat.frequency.secs3task.refresh.poll.secs10
topology.acker.executors1000topology.builtin.metrics.bucket.size.secs60
topology.debugtruetopology.disruptor.wait.strategy
com.lmax.disruptor.BlockingWaitStrategytopology.enable.message.timeoutstrue
topology.error.throttle.interval.secs10topology.executor.receive.buffer.size
65536topology.executor.send.buffer.size65536
topology.fall.back.on.java.serializationtruetopology.kryo.decorators
topology.kryo.factorybacktype.storm.serialization.DefaultKryoFactory
topology.kryo.register[object Object]topology.max.error.report.per.interval5
topology.max.spout.pending5000topology.max.task.parallelism100
topology.message.timeout.secs60topology.multilang.serializer
backtype.storm.multilang.JsonSerializertopology.nameCEXPStormTopology
topology.receiver.buffer.size8topology.skip.missing.kryo.registrationsfalse
topology.sleep.spout.wait.strategy.time.ms1topology.spout.wait.strategy
backtype.storm.spout.SleepSpoutWaitStrategy
topology.state.synchronization.timeout.secs60topology.stats.sample.rate0.05
topology.taskstopology.tick.tuple.freq.secstopology.transfer.buffer.size32
topology.trident.batch.emit.interval.millis500topology.tuple.serializer
backtype.storm.serialization.types.ListDelegateSerializer
topology.worker.childoptstopology.worker.receiver.thread.count1
topology.worker.shared.thread.pool.size4topology.workers20
transactional.zookeeper.porttransactional.zookeeper.root/transactional
transactional.zookeeper.serversui.childopts-Xmx768mui.port8080
worker.childopts-Xmx1024mworker.heartbeat.frequency.secs1zmq.hwm0
zmq.linger.millis5000zmq.threads1


--
Kushan Maskey
817.403.7500

On Mon, Sep 22, 2014 at 9:25 PM, Kushan Maskey <
kushan.mas...@mmillerassociates.com> wrote:

> Here is my storm config.
>
>
> storm.config.setMaxTaskParallelism=4
>
> storm.config.setNumWorkers=20
>
> storm.config.setMaxSpoutPending=5000
>
> storm.config.numAckers=1000
>
>
> I am guessing I need to increase the maxTaskParallelism more. IF that is
> the case how much would you suggest? Any  help will be highly appreciated.
>
>
> Thanks.
>
> --
> Kushan Maskey
> 817.403.7500
>
> On Mon, Sep 22, 2014 at 9:20 PM, Michael Rose <mich...@fullcontact.com>
> wrote:
>
>> Storm is not your bottleneck. Check your Storm code to 1) ensure you're
>> parallelizing your writes and 2) you're batching writes to your external
>> resources if possible. Some quick napkin math shows you only doing 110
>> writes/s, which seems awfully low.
>>
>> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
>> mich...@fullcontact.com
>>
>> On Mon, Sep 22, 2014 at 8:05 PM, Kushan Maskey <
>> kushan.mas...@mmillerassociates.com> wrote:
>>
>>> I am trying to load 20 M records into Cassandra database through
>>> Kafka-Storm. I am able to post all the data in 5 mins into Kafka. But
>>> reading it from storm and inserting into Cassandra, Couch and Solr is kind
>>> of very slow. It has been running for past 5 hours and so far only 2
>>> Million records.
>>>
>>> How do I make the storm perform faster? Coz in this pace it will take
>>> couple of days to load all the data.
>>>
>>> --
>>> Kushan Maskey
>>>
>>>
>>
>

Reply via email to