Re: Worker thread memory

Nathan Leung Thu, 25 Jun 2015 08:11:50 -0700

The nimbus log will tell you which port the worker was started on (look for
the worker hash, it will give supervisor node and port assignments but
requires some decoding).  Then take a look at the worker log.  Maybe your
initialization is taking too long?


On Thu, Jun 25, 2015 at 11:06 AM, Nick R. Katsipoulakis <
[email protected]> wrote:

> Yes, I see the following message which I have not seen before:
>
> 2015-06-24T19:05:28.745+0000 b.s.d.supervisor [INFO]
> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started
> 2015-06-24T19:05:29.245+0000 b.s.d.supervisor [INFO]
> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started
> 2015-06-24T19:05:29.746+0000 b.s.d.supervisor [INFO]
> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started
> 2015-06-24T19:05:30.246+0000 b.s.d.supervisor [INFO]
> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started
> 2015-06-24T19:05:30.646+0000 b.s.d.supervisor [INFO] Removing code for
> storm id tpch-q5-top-5-1435172243
> 2015-06-24T19:05:30.747+0000 b.s.d.supervisor [INFO]
> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started
> 2015-06-24T19:05:31.247+0000 b.s.d.supervisor [INFO]
> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started
>
> 2015-06-24T19:06:50.327+0000 b.s.d.supervisor [INFO] Worker
> fa3de772-cc61-4394-97e2-fcbd85190dd4 failed to start
> 2015-06-24T19:06:50.329+0000 b.s.d.supervisor [INFO] Shutting down and
> clearing state for id fa3de772-cc61-4394-97e2-fcbd85190dd4. Current
> supervisor time: 1435172810. State: :not-started, Heartbeat: nil
> 2015-06-24T19:06:50.329+0000 b.s.d.supervisor [INFO] Shutting down
> 58e551ba-f944-4aec-9c8f-5621053021dd:fa3de772-cc61-4394-97e2-fcbd85190dd4
> 2015-06-24T19:06:50.330+0000 b.s.d.supervisor [INFO] Shut down
> 58e551ba-f944-4aec-9c8f-5621053021dd:fa3de772-cc61-4394-97e2-fcbd85190dd4
> 2015-06-24T19:08:39.743+0000 b.s.d.supervisor [INFO] Shutting down
> supervisor 58e551ba-f944-4aec-9c8f-5621053021dd
> 2015-06-24T19:08:39.745+0000 b.s.event [INFO] Event manager interrupted
> 2015-06-24T19:08:39.745+0000 b.s.event [INFO] Event manager interrupted
> 2015-06-24T19:08:39.748+0000 o.a.s.z.ZooKeeper [INFO] Session:
> 0x24e26a304b50025 closed
> 2015-06-24T19:08:39.748+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut
> down
>
> But no indication on why the above is happening.
>
> Thanks,
> Nick
>
> 2015-06-25 10:52 GMT-04:00 Nathan Leung <[email protected]>:
>
>> Any problems in supervisor or nimbus logs?
>>
>> On Thu, Jun 25, 2015 at 10:49 AM, Nick R. Katsipoulakis <
>> [email protected]> wrote:
>>
>>> I am using m4.xlarge instances, each one with 4 workers per supervisor.
>>> Yes, they are listed.
>>>
>>> Nick
>>>
>>> 2015-06-25 10:47 GMT-04:00 Nathan Leung <[email protected]>:
>>>
>>>> How big are your EC2 instances?  Are your supervisors listed in the
>>>> storm UI?
>>>>
>>>> On Thu, Jun 25, 2015 at 10:43 AM, Nick R. Katsipoulakis <
>>>> [email protected]> wrote:
>>>>
>>>>> Nathan,
>>>>>
>>>>> I attempted to put the following line
>>>>>
>>>>> worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>>>> -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:
>>>>> CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled
>>>>> Djava.net.preferIPv4Stack=true"
>>>>>
>>>>> in the supervisor config files, but for some reason workers were not
>>>>> spawned on those machines. To be more precise, I submitted my topology
>>>>> (with storm jar...) and I just waited for it to start executing, but
>>>>> nothing. Any ideas of what might have been the reason?
>>>>>
>>>>> Thanks,
>>>>> Nick
>>>>>
>>>>> 2015-06-25 10:39 GMT-04:00 Nathan Leung <[email protected]>:
>>>>>
>>>>>> In general worker options need to be set in the supervisor config
>>>>>> files.
>>>>>>
>>>>>> On Thu, Jun 25, 2015 at 10:07 AM, Nick R. Katsipoulakis <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hello sy.pan
>>>>>>>
>>>>>>> Thank you for the link. I will try the suggestions.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Nick
>>>>>>>
>>>>>>> 2015-06-24 22:35 GMT-04:00 sy.pan <[email protected]>:
>>>>>>>
>>>>>>>> FYI:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://mail-archives.apache.org/mod_mbox/storm-user/201504.mbox/%3ccafbccrcadux8sl8d99tomrbg9hkmo3gkg-qdv-qkmc-6zxs...@mail.gmail.com%3E
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2015年6月25日，02:14，Nick R. Katsipoulakis <[email protected]>
>>>>>>>> 写道：
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I am working on an EC2 Storm cluster, and I want the workers in the
>>>>>>>> supervisor machines to use 4GBs of memory, so I add the following line 
>>>>>>>> in
>>>>>>>> the machine that hosts the nimbus:
>>>>>>>>
>>>>>>>> worker.childopts-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>>>>>>> -XX:+UseConcMarkSweepGC -XX:NewSize=128m
>>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled
>>>>>>>> Djava.net.preferIPv4Stack=true
>>>>>>>> However, when I take a look into the workers' logs (on each other
>>>>>>>> machine who is running a supervisor), I do not find the above line on 
>>>>>>>> the
>>>>>>>> part that launches the worker with the given arguments. In fact, I 
>>>>>>>> find the
>>>>>>>> following line:
>>>>>>>>
>>>>>>>> 2015-06-24T17:52:45.349+0000 b.s.d.worker [INFO] Launching worker
>>>>>>>> for tpch-q5-top-2-1435168361 on 
>>>>>>>> 5568726d-ad65-4a7c-ba52-32eed83276ad:6703
>>>>>>>> with id 829f36fc-eeb9-4eef-ae89-9fb6565e9108 and conf 
>>>>>>>> {"dev.zookeeper.path"
>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>>> "topology.skip.missing.kryo.registrations" false,
>>>>>>>> "storm.messaging.netty.client_worker_threads" 4, "ui.childopts" 
>>>>>>>> "-Xmx768m",
>>>>>>>> "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true,
>>>>>>>> "topology.trident.batch.emit.interval.millis" 500, "
>>>>>>>> storm.messaging.netty.flush.check.interval.ms" 10,
>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", 
>>>>>>>> "storm.home"
>>>>>>>> "/opt/apache-storm-0.9.4", "topology.executor.send.buffer.size" 1024,
>>>>>>>> "storm.local.dir" "/mnt/storm", "storm.messaging.netty.buffer_size"
>>>>>>>> 10485760, "supervisor.worker.start.timeout.secs" 120,
>>>>>>>> "topology.enable.message.timeouts" true, 
>>>>>>>> "nimbus.cleanup.inbox.freq.secs"
>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>>>> "storm.meta.serialization.delegate"
>>>>>>>> "backtype.storm.serialization.DefaultSerializationDelegate",
>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" 
>>>>>>>> "52.25.74.163",
>>>>>>>> "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2181,
>>>>>>>> "transactional.zookeeper.port" nil, 
>>>>>>>> "topology.executor.receive.buffer.size"
>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>> "supervisor.enable" true, 
>>>>>>>> "storm.messaging.netty.server_worker_threads" 4,
>>>>>>>> "storm.zookeeper.servers" ["172.31.28.73" "172.31.38.251" 
>>>>>>>> "172.31.38.252"],
>>>>>>>> "transactional.zookeeper.root" "/transactional", 
>>>>>>>> "topology.acker.executors"
>>>>>>>> nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" 
>>>>>>>> nil,
>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m",
>>>>>>>> "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 
>>>>>>>> 3772,
>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>>> "topology.tasks" nil, "storm.messaging.netty.max_retries" 100,
>>>>>>>> "topology.spout.wait.strategy"
>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy",
>>>>>>>> "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" 
>>>>>>>> nil,
>>>>>>>> "storm.zookeeper.retry.interval" 1000, "
>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>> "nimbus.topology.validator"
>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", 
>>>>>>>> "supervisor.slots.ports"
>>>>>>>> [6700 6701 6702 6703], "topology.environment" nil, "topology.debug" 
>>>>>>>> false,
>>>>>>>> "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60,
>>>>>>>> "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10,
>>>>>>>> "topology.workers" 1, "supervisor.childopts" "-Xmx256m",
>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", 
>>>>>>>> "topology.multilang.serializer"
>>>>>>>> "backtype.storm.multilang.JsonSerializer", "nimbus.task.timeout.secs" 
>>>>>>>> 30,
>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", 
>>>>>>>> "drpc.invocations.port"
>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, 
>>>>>>>> "storm.zookeeper.retry.times"
>>>>>>>> 5, "topology.worker.receiver.thread.count" 1, "storm.thrift.transport"
>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>>>> 600, "storm.messaging.transport" 
>>>>>>>> "backtype.storm.messaging.netty.Context", "
>>>>>>>> logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms"
>>>>>>>> 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false,
>>>>>>>> "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "storm.cluster.mode"
>>>>>>>> "distributed", "topology.max.task.parallelism" nil,
>>>>>>>> "storm.messaging.netty.transfer.batch.size" 262144, 
>>>>>>>> "topology.classpath"
>>>>>>>> nil}
>>>>>>>>
>>>>>>>> which as you can see uses topology.worker.childopts: nil and
>>>>>>>> worker.childops: -Xmx768m. My question is the following: Do I need to 
>>>>>>>> add
>>>>>>>> the above line in the storm.yaml files of my supervisor nodes in order 
>>>>>>>> to
>>>>>>>> allow the JVM to use up to 4GBs of memory? Also, am I setting the right
>>>>>>>> value for what I am trying to achieve?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Nick
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Nikolaos Romanos Katsipoulakis,
>>>>>>> University of Pittsburgh, PhD candidate
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nikolaos Romanos Katsipoulakis,
>>>>> University of Pittsburgh, PhD candidate
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nikolaos Romanos Katsipoulakis,
>>> University of Pittsburgh, PhD candidate
>>>
>>
>>
>
>
> --
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate
>

Re: Worker thread memory

Reply via email to