I see. Well, I took a look at the nimbus.log and everything looks fine and it still seems strange why this is happening. On top of that, another strange thing is that all my bolts are placed in the same supervisor and the same worker (which does not seem too smart for Storm to do). My topology defines a total of parallelism hint of 23 tasks and I have 4 supervisor nodes, each one with 4 worker processes.
Nick 2015-06-25 11:22 GMT-04:00 Nathan Leung <[email protected]>: > I'm not sure but if I had to wager a guess the former is set on the > supervisor and will be applied to all topologies run on that supervisor, > whereas the latter is set per topology. > > On Thu, Jun 25, 2015 at 11:19 AM, Nick R. Katsipoulakis < > [email protected]> wrote: > >> I see. I will try to debug and see what's going on. Also, what is the >> difference between worker.childopts and topology.worker.childopts? >> >> Thanks, >> Nick >> >> 2015-06-25 11:10 GMT-04:00 Nathan Leung <[email protected]>: >> >>> The nimbus log will tell you which port the worker was started on (look >>> for the worker hash, it will give supervisor node and port assignments but >>> requires some decoding). Then take a look at the worker log. Maybe your >>> initialization is taking too long? >>> >>> On Thu, Jun 25, 2015 at 11:06 AM, Nick R. Katsipoulakis < >>> [email protected]> wrote: >>> >>>> Yes, I see the following message which I have not seen before: >>>> >>>> 2015-06-24T19:05:28.745+0000 b.s.d.supervisor [INFO] >>>> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >>>> 2015-06-24T19:05:29.245+0000 b.s.d.supervisor [INFO] >>>> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >>>> 2015-06-24T19:05:29.746+0000 b.s.d.supervisor [INFO] >>>> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >>>> 2015-06-24T19:05:30.246+0000 b.s.d.supervisor [INFO] >>>> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >>>> 2015-06-24T19:05:30.646+0000 b.s.d.supervisor [INFO] Removing code for >>>> storm id tpch-q5-top-5-1435172243 >>>> 2015-06-24T19:05:30.747+0000 b.s.d.supervisor [INFO] >>>> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >>>> 2015-06-24T19:05:31.247+0000 b.s.d.supervisor [INFO] >>>> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >>>> >>>> 2015-06-24T19:06:50.327+0000 b.s.d.supervisor [INFO] Worker >>>> fa3de772-cc61-4394-97e2-fcbd85190dd4 failed to start >>>> 2015-06-24T19:06:50.329+0000 b.s.d.supervisor [INFO] Shutting down and >>>> clearing state for id fa3de772-cc61-4394-97e2-fcbd85190dd4. Current >>>> supervisor time: 1435172810. State: :not-started, Heartbeat: nil >>>> 2015-06-24T19:06:50.329+0000 b.s.d.supervisor [INFO] Shutting down >>>> 58e551ba-f944-4aec-9c8f-5621053021dd:fa3de772-cc61-4394-97e2-fcbd85190dd4 >>>> 2015-06-24T19:06:50.330+0000 b.s.d.supervisor [INFO] Shut down >>>> 58e551ba-f944-4aec-9c8f-5621053021dd:fa3de772-cc61-4394-97e2-fcbd85190dd4 >>>> 2015-06-24T19:08:39.743+0000 b.s.d.supervisor [INFO] Shutting down >>>> supervisor 58e551ba-f944-4aec-9c8f-5621053021dd >>>> 2015-06-24T19:08:39.745+0000 b.s.event [INFO] Event manager interrupted >>>> 2015-06-24T19:08:39.745+0000 b.s.event [INFO] Event manager interrupted >>>> 2015-06-24T19:08:39.748+0000 o.a.s.z.ZooKeeper [INFO] Session: >>>> 0x24e26a304b50025 closed >>>> 2015-06-24T19:08:39.748+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut >>>> down >>>> >>>> But no indication on why the above is happening. >>>> >>>> Thanks, >>>> Nick >>>> >>>> 2015-06-25 10:52 GMT-04:00 Nathan Leung <[email protected]>: >>>> >>>>> Any problems in supervisor or nimbus logs? >>>>> >>>>> On Thu, Jun 25, 2015 at 10:49 AM, Nick R. Katsipoulakis < >>>>> [email protected]> wrote: >>>>> >>>>>> I am using m4.xlarge instances, each one with 4 workers per >>>>>> supervisor. Yes, they are listed. >>>>>> >>>>>> Nick >>>>>> >>>>>> 2015-06-25 10:47 GMT-04:00 Nathan Leung <[email protected]>: >>>>>> >>>>>>> How big are your EC2 instances? Are your supervisors listed in the >>>>>>> storm UI? >>>>>>> >>>>>>> On Thu, Jun 25, 2015 at 10:43 AM, Nick R. Katsipoulakis < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Nathan, >>>>>>>> >>>>>>>> I attempted to put the following line >>>>>>>> >>>>>>>> worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC >>>>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX: >>>>>>>> CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled >>>>>>>> Djava.net.preferIPv4Stack=true" >>>>>>>> >>>>>>>> in the supervisor config files, but for some reason workers were >>>>>>>> not spawned on those machines. To be more precise, I submitted my >>>>>>>> topology >>>>>>>> (with storm jar...) and I just waited for it to start executing, but >>>>>>>> nothing. Any ideas of what might have been the reason? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Nick >>>>>>>> >>>>>>>> 2015-06-25 10:39 GMT-04:00 Nathan Leung <[email protected]>: >>>>>>>> >>>>>>>>> In general worker options need to be set in the supervisor config >>>>>>>>> files. >>>>>>>>> >>>>>>>>> On Thu, Jun 25, 2015 at 10:07 AM, Nick R. Katsipoulakis < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hello sy.pan >>>>>>>>>> >>>>>>>>>> Thank you for the link. I will try the suggestions. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Nick >>>>>>>>>> >>>>>>>>>> 2015-06-24 22:35 GMT-04:00 sy.pan <[email protected]>: >>>>>>>>>> >>>>>>>>>>> FYI: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://mail-archives.apache.org/mod_mbox/storm-user/201504.mbox/%3ccafbccrcadux8sl8d99tomrbg9hkmo3gkg-qdv-qkmc-6zxs...@mail.gmail.com%3E >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 在 2015年6月25日,02:14,Nick R. Katsipoulakis <[email protected]> >>>>>>>>>>> 写道: >>>>>>>>>>> >>>>>>>>>>> Hello all, >>>>>>>>>>> >>>>>>>>>>> I am working on an EC2 Storm cluster, and I want the workers in >>>>>>>>>>> the supervisor machines to use 4GBs of memory, so I add the >>>>>>>>>>> following line >>>>>>>>>>> in the machine that hosts the nimbus: >>>>>>>>>>> >>>>>>>>>>> worker.childopts-Xmx4096m -XX:+UseConcMarkSweepGC >>>>>>>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m >>>>>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled >>>>>>>>>>> Djava.net.preferIPv4Stack=true >>>>>>>>>>> However, when I take a look into the workers' logs (on each >>>>>>>>>>> other machine who is running a supervisor), I do not find the above >>>>>>>>>>> line on >>>>>>>>>>> the part that launches the worker with the given arguments. In >>>>>>>>>>> fact, I find >>>>>>>>>>> the following line: >>>>>>>>>>> >>>>>>>>>>> 2015-06-24T17:52:45.349+0000 b.s.d.worker [INFO] Launching >>>>>>>>>>> worker for tpch-q5-top-2-1435168361 on >>>>>>>>>>> 5568726d-ad65-4a7c-ba52-32eed83276ad:6703 with id >>>>>>>>>>> 829f36fc-eeb9-4eef-ae89-9fb6565e9108 and conf {"dev.zookeeper.path" >>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, >>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60, >>>>>>>>>>> "topology.fall.back.on.java.serialization" true, >>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" >>>>>>>>>>> 5000, >>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, >>>>>>>>>>> "storm.messaging.netty.client_worker_threads" 4, "ui.childopts" >>>>>>>>>>> "-Xmx768m", >>>>>>>>>>> "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, >>>>>>>>>>> "topology.trident.batch.emit.interval.millis" 500, " >>>>>>>>>>> storm.messaging.netty.flush.check.interval.ms" 10, >>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", >>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", >>>>>>>>>>> "storm.home" >>>>>>>>>>> "/opt/apache-storm-0.9.4", "topology.executor.send.buffer.size" >>>>>>>>>>> 1024, >>>>>>>>>>> "storm.local.dir" "/mnt/storm", "storm.messaging.netty.buffer_size" >>>>>>>>>>> 10485760, "supervisor.worker.start.timeout.secs" 120, >>>>>>>>>>> "topology.enable.message.timeouts" true, >>>>>>>>>>> "nimbus.cleanup.inbox.freq.secs" >>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" >>>>>>>>>>> 64, >>>>>>>>>>> "storm.meta.serialization.delegate" >>>>>>>>>>> "backtype.storm.serialization.DefaultSerializationDelegate", >>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" >>>>>>>>>>> "52.25.74.163", >>>>>>>>>>> "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" >>>>>>>>>>> 2181, >>>>>>>>>>> "transactional.zookeeper.port" nil, >>>>>>>>>>> "topology.executor.receive.buffer.size" >>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" >>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, >>>>>>>>>>> "supervisor.enable" true, >>>>>>>>>>> "storm.messaging.netty.server_worker_threads" 4, >>>>>>>>>>> "storm.zookeeper.servers" ["172.31.28.73" "172.31.38.251" >>>>>>>>>>> "172.31.38.252"], >>>>>>>>>>> "transactional.zookeeper.root" "/transactional", >>>>>>>>>>> "topology.acker.executors" >>>>>>>>>>> nil, "topology.transfer.buffer.size" 1024, >>>>>>>>>>> "topology.worker.childopts" nil, >>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", >>>>>>>>>>> "supervisor.heartbeat.frequency.secs" 5, >>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, >>>>>>>>>>> "drpc.port" 3772, >>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", >>>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" >>>>>>>>>>> 3, >>>>>>>>>>> "topology.tasks" nil, "storm.messaging.netty.max_retries" 100, >>>>>>>>>>> "topology.spout.wait.strategy" >>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", >>>>>>>>>>> "nimbus.thrift.max_buffer_size" 1048576, >>>>>>>>>>> "topology.max.spout.pending" nil, >>>>>>>>>>> "storm.zookeeper.retry.interval" 1000, " >>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1, >>>>>>>>>>> "nimbus.topology.validator" >>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", >>>>>>>>>>> "supervisor.slots.ports" >>>>>>>>>>> [6700 6701 6702 6703], "topology.environment" nil, "topology.debug" >>>>>>>>>>> false, >>>>>>>>>>> "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, >>>>>>>>>>> "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, >>>>>>>>>>> "topology.workers" 1, "supervisor.childopts" "-Xmx256m", >>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, >>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" >>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer", >>>>>>>>>>> "topology.disruptor.wait.strategy" >>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", >>>>>>>>>>> "topology.multilang.serializer" >>>>>>>>>>> "backtype.storm.multilang.JsonSerializer", >>>>>>>>>>> "nimbus.task.timeout.secs" 30, >>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" >>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", >>>>>>>>>>> "drpc.invocations.port" >>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, >>>>>>>>>>> "storm.zookeeper.retry.times" >>>>>>>>>>> 5, "topology.worker.receiver.thread.count" 1, >>>>>>>>>>> "storm.thrift.transport" >>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin", >>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60, >>>>>>>>>>> "supervisor.worker.timeout.secs" 30, >>>>>>>>>>> "nimbus.file.copy.expiration.secs" >>>>>>>>>>> 600, "storm.messaging.transport" >>>>>>>>>>> "backtype.storm.messaging.netty.Context", " >>>>>>>>>>> logviewer.appender.name" "A1", >>>>>>>>>>> "storm.messaging.netty.max_wait_ms" 1000, >>>>>>>>>>> "drpc.request.timeout.secs" 600, >>>>>>>>>>> "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" >>>>>>>>>>> "-Xmx1024m", "storm.cluster.mode" "distributed", >>>>>>>>>>> "topology.max.task.parallelism" nil, >>>>>>>>>>> "storm.messaging.netty.transfer.batch.size" 262144, >>>>>>>>>>> "topology.classpath" >>>>>>>>>>> nil} >>>>>>>>>>> >>>>>>>>>>> which as you can see uses topology.worker.childopts: nil and >>>>>>>>>>> worker.childops: -Xmx768m. My question is the following: Do I need >>>>>>>>>>> to add >>>>>>>>>>> the above line in the storm.yaml files of my supervisor nodes in >>>>>>>>>>> order to >>>>>>>>>>> allow the JVM to use up to 4GBs of memory? Also, am I setting the >>>>>>>>>>> right >>>>>>>>>>> value for what I am trying to achieve? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Nick >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nikolaos Romanos Katsipoulakis, >>>>>>>>>> University of Pittsburgh, PhD candidate >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nikolaos Romanos Katsipoulakis, >>>>>>>> University of Pittsburgh, PhD candidate >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nikolaos Romanos Katsipoulakis, >>>>>> University of Pittsburgh, PhD candidate >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nikolaos Romanos Katsipoulakis, >>>> University of Pittsburgh, PhD candidate >>>> >>> >>> >> >> >> -- >> Nikolaos Romanos Katsipoulakis, >> University of Pittsburgh, PhD candidate >> > > -- Nikolaos Romanos Katsipoulakis, University of Pittsburgh, PhD candidate
