Re: Topology code distribution takes too much time

Igor Kuzmenko Sun, 27 Mar 2016 14:04:26 -0700

Thanks for replay, guys.
While deploying topology, CPU load is arround zero. When topology deployed
and started to work it raises up to 25%, so its not CPU overload or cgroup.
The network is able to load 150MB in less than a second, i checked it with
SCP. When deploying topology theres no major network activity, whatched it
with ambari.
JVMs settings are storm defaults except worker Xmx increased up to 2048m
(768 default). No OutOfMemory or other kind of exceptions was thrown in a
process.


I doubt that it's hardware issue, cause all worked fine with old version of
HDP.

On Sun, Mar 27, 2016 at 11:22 PM, Erik Weathers <[email protected]>
wrote:

> Igor, *maybe* you have the supervisor in a cgroup that is limiting the CPU
> for the process?  (Don't know anything about the HDP packaged storm)
>
> - Erik
>
> On Sun, Mar 27, 2016 at 11:30 AM, Andrey Dudin <[email protected]>
> wrote:
>
>> Hi Igor.
>> Try to dump threads and look on it. I think you can find problem in dump.
>>
>> Look at lan, cpu and ram load. Maybe you CPU overloaded or ram and lan.
>>
>> 2016-03-27 21:26 GMT+03:00 Igor Kuzmenko <[email protected]>:
>>
>>> Hello, I'm using Hortonworks Data Platform v2.3.4 with included storm
>>> 0.10.
>>> After deploying my topology using "*storm jar*" command it takes about
>>> 3 min to distribute code. The network is 10Gb/s, topology jar is about
>>> 150MB, cluster has 2 nodes with supervisors on them, so I assume that this
>>> process shouldn't take much time. Actualy, it wasn't with Hortonworks
>>> platform 2.3.0, I can't understand what happend since than. Heres change
>>> log
>>> <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/content/patch_storm.html>
>>>  of
>>> HDP, hope you can help.
>>>
>>> Here's logs.
>>> Nimbus download jar pretty fast:
>>> *2016-02-25 17:10:13.040 b.s.d.nimbus [INFO] Uploading file from client
>>> to
>>> /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
>>> *2016-02-25 17:10:14.915 b.s.d.nimbus [INFO] Finished uploading file
>>> from client:
>>> /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
>>> 2016-02-25 17:10:14.928 b.s.d.nimbus [INFO] [req 22] Access from:
>>> principal: op:submitTopology
>>> 2016-02-25 17:10:14.933 b.s.d.nimbus [INFO] Received topology submission
>>> for CdrPerformanceTestTopology-phoenix-bolt-4 with conf
>>> {"topology.max.task.parallelism" nil, "hbase.conf" {"hbase.rootdir" "hdfs://
>>> sorm-master02.msk.mts.ru:8020/apps/hbase/data"},
>>> "topology.submitter.principal" "", "topology.acker.executors" nil,
>>> "topology.workers" 2, "topology.message.timeout.secs" 30, "topology.debug"
>>> true, "topology.max.spout.pending" 10000, "storm.zookeeper.superACL" nil,
>>> "topology.users" (), "topology.submitter.user" "", "topology.kryo.register"
>>> {"org.apache.avro.util.Utf8" nil,
>>> "ru.mts.sorm.schemes.avro.cdr.CbossCdrRecord" nil},
>>> "topology.kryo.decorators" (), "storm.id"
>>> "CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414", "topology.name"
>>> "CdrPerformanceTestTopology-phoenix-bolt-4"}
>>> 2016-02-25 17:10:14.936 b.s.d.nimbus [INFO] nimbus file
>>> location:/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>> 2016-02-25 17:10:15.160 b.s.c.LocalFileSystemCodeDistributor [INFO]
>>> Created meta file
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/storm-code-distributor.meta
>>> upload successful.
>>> 2016-02-25 17:10:15.164 b.s.zookeeper [INFO] Node already in queue for
>>> leader lock.
>>> 2016-02-25 17:10:15.165 b.s.d.nimbus [INFO] desired replication count of
>>> 1 not achieved but we have hit the max wait time 60 so moving on with
>>> replication count = 1
>>> 2016-02-25 17:10:15.169 b.s.d.nimbus [INFO] Activating
>>> CdrPerformanceTestTopology-phoenix-bolt-4:
>>> CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>> 2016-02-25 17:10:15.186 b.s.s.EvenScheduler [INFO] Available slots:
>>> (["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6701]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6701])
>>> 2016-02-25 17:10:15.189 b.s.d.nimbus [INFO] Setting new assignment for
>>> topology id CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414:
>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>> "/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414",
>>> :node->host {"edef7b4d-2358-41c6-8a03-a7638d0f68c6" "
>>> sorm-data06.msk.mts.ru", "0a6e4de5-dd83-4307-b7e6-1db60c4c0898" "
>>> sorm-data07.msk.mts.ru"}, :executor->node+port {[8 8]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [2 2]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [7 7]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [3 3]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [1 1]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [6 6]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [9 9]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [11 11]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [5 5]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [10 10]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [4 4]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]}, :executor->start-time-secs
>>> {[8 8] 1456409415, [2 2] 1456409415, [7 7] 1456409415, [3 3] 1456409415, [1
>>> 1] 1456409415, [6 6] 1456409415, [9 9] 1456409415, [11 11] 1456409415, [5
>>> 5] 1456409415, [10 10] 1456409415, [4 4] 1456409415}}
>>>
>>> But supervisor stuck somewhere:
>>> *2016-02-25 17:10:15.198 b.s.d.supervisor [INFO] Downloading code for
>>> storm id* CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>> 2016-02-25 17:10:15.199 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>>> 2016-02-25 17:10:15.205 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>>> 2016-02-25 17:10:17.405 b.s.c.LocalFileSystemCodeDistributor [INFO]
>>> Attempting to download meta file
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormjar.jar
>>> from remote sorm-master02.msk.mts.ru:6627
>>> 2016-02-25 *17:10:17.406* b.s.u.StormBoundedExponentialBackoffRetry
>>> [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries
>>> [5]
>>> 2016-02-25 *17:13:29.002* b.s.c.LocalFileSystemCodeDistributor [INFO]
>>> Attempting to download meta file
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormconf.ser
>>> from remote sorm-master02.msk.mts.ru:6627
>>> 2016-02-25 17:13:29.003 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>>> 2016-02-25 17:13:31.148 b.s.c.LocalFileSystemCodeDistributor [INFO]
>>> Attempting to download meta file
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormcode.ser
>>> from remote sorm-master02.msk.mts.ru:6627
>>> 2016-02-25 17:13:31.148 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>>> *2016-02-25 17:13:34.124 b.s.d.supervisor [INFO] Finished downloading
>>> code for storm id*
>>> CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>>
>>>
>>>
>>
>>
>> --
>> С уважением Дудин Андрей
>>
>
>

Re: Topology code distribution takes too much time

Reply via email to