Thank you Fan for your response. Can you please give me some more information on how to do what you suggest?
Regards, Nick 2015-06-25 16:14 GMT-04:00 Fan Jiang <[email protected]>: > I remember someone post the same issue yesterday. The problem is that your > host is somehow resolved as "Netty-Client-*", which is not pingable. You > may modify /etc/hosts to map the hostnames to IP addresses appropriately if > it is allowed. > > — > Sincerely, > Fan Jiang > > > On Thu, Jun 25, 2015 at 3:57 PM, Nick R. Katsipoulakis < > [email protected]> wrote: > >> Hello all, >> >> I apologize for the long message, but I have no idea what is going wrong >> in my setup and I tried to give a lot of info about my cluster. I have the >> following EC2 setup: >> >> 1) 3x m4.xlarge nodes for a 3-node ZooKeeper ensemble and a nimbus >> >> 2) 4x m4.xlarge nodes for my Supervisors. >> >> All of the machines are running Ubuntu Linux v14, OpenJDK v1.7 and Apache >> Storm v0.9.4. The storm.yaml I am currently having in the nimbus node >> (only) has the following values: >> >> storm.home: "/opt/apache-storm-0.9.4" >> storm.local.dir: "/mnt/storm" >> storm.zookeeper.servers: >> - "172.31.28.73" >> - "172.31.38.251" >> - "172.31.38.252" >> storm.zookeeper.port: 2181 >> storm.zookeeper.root: "/storm" >> storm.zookeeper.session.timeout: 20000 >> storm.zookeeper.connection.timeout: 15000 >> storm.zookeeper.retry.times: 5 >> storm.zookeeper.retry.interval: 1000 >> storm.zookeeper.retry.invervalceiling.millis: 30000 >> storm.cluster.mode: "distributed" >> storm.local.mode.zmq: false >> storm.thrift.transport: >> "backtype.storm.security.auth.SimpleTransportPlugin" >> storm.messaging.transport: "backtype.storm.messaging.netty.Context" >> >> nimbus.host: "127.0.0.1" >> nimbus.thrift.port: 6627 >> nimbus.thrift.max_buffer_size: 1048576 >> nimbus.thrift.threads: 256 >> nimbus.childopts: "-Xmx256m" >> nimbus.task.timeout.secs: 30 >> nimbus.supervisor.timeout.secs: 60 >> nimbus.monitor.freq.secs: 10 >> nimbus.cleanup.inbox.freq.secs: 600 >> nimbus.inbox.jar.expiration.secs: 3600 >> nimbus.task.launch.secs: 120 >> nimbus.reassign: true >> nimbus.file.copy.expiration.secs: 600 >> nimbus.topology.validator: >> "backtype.storm.nimbus.DefaultTopologyValidator" >> >> ui.port: 8080 >> ui.childopts: "-Xmx768m" >> logviewer.port: 8000 >> logviewer.childopts: "-Xmx256m" >> logviewer.appender.name: "A1" >> >> drpc.port: 3772 >> drpc.worker.threads: 64 >> drpc.queue.size: 128 >> drpc.invocations.port: 3773 >> drpc.request.timeout.secs: 600 >> drpc.childopts: "-Xmx768m" >> >> transactional.zookeeper.root: "/transactional" >> transactional.zookeeper.servers: null >> transactional.zookeeper.port: null >> >> supervisor.slots.ports: >> - 6700 >> - 6701 >> - 6702 >> - 6703 >> supervisor.childopts: "-Xmx256m" >> supervisor.worker.start.timeout.secs: 120 >> supervisor.worker.timeout.secs: 30 >> supervisor.monitor.frequency.secs: 3 >> supervisor.heartbeat.frequency.secs: 5 >> supervisor.enable: true >> >> worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC >> -XX:+UseConcMarkSweepGC -XX:NewSize=128m >> -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled >> -Djava.net.preferIPv4Stack=true" >> worker.heartbeat.frequency.secs: 1 >> >> task.heartbeat.frequency.secs: 3 >> task.refresh.poll.secs: 10 >> >> zmq.threads: 1 >> zmq.linger.millis: 5000 >> zmq.hwm: 0 >> >> storm.messaging.netty.server_worker_threads: 4 >> storm.messaging.netty.client_worker_threads: 4 >> storm.messaging.netty.buffer_size: 10485760 >> storm.messaging.netty.max_retries: 100 >> storm.messaging.netty.max_wait_ms: 1000 >> storm.messaging.netty.min_wait_ms: 100 >> topology.enable.message.timeouts: true >> topology.debug: false >> topology.optimize: true >> topology.workers: 1 >> topology.acker.executors: null >> topology.tasks: null >> topology.message.timeout.secs: 30 >> topology.skip.missing.kryo.registrations: false >> topology.max.task.parallelism: null >> topology.max.spout.pending: null >> topology.state.synchronization.timeout.secs: 60 >> topology.stats.sample.rate: 0.05 >> topology.builtin.metrics.bucket.size.secs: 60 >> topology.fall.back.on.java.serialization: true >> topology.worker.childopts: null >> topology.executor.receive.buffer.size: 1024 >> topology.executor.send.buffer.size: 1024 >> topology.receiver.buffer.size: 8 >> topology.transfer.buffer.size: 1024 >> topology.tick.tuple.freq.secs: null >> topology.worker.shared.thread.pool.size: 4 >> topology.disruptor.wait.strategy: >> "com.lmax.disruptor.BlockingWaitStrategy" >> topology.spout.wait.strategy: >> "backtype.storm.spout.SleepSpoutWaitStrategy" >> topology.sleep.spout.wait.strategy.time.ms: 1 >> topology.error.throttle.interval.secs: 10 >> topology.max.error.report.per.interval: 5 >> topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory" >> topology.tuple.serializer: >> "backtype.storm.serialization.types.ListDelegateSerializer" >> topology.trident.batch.emit.interval.millis: 500 >> >> dev.zookeeper.path: "/tmp/dev-storm-zookeeper" >> >> The problem is that every time I submit a topology, I got a lot of Netty >> messages in my worker logs (found in the supervisor machines) and >> many of them had similar to the following messages: >> >> 2015-06-25T19:42:32.534+0000 b.s.u.StormBoundedExponentialBackoffRetry >> [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries >> [5] >> 2015-06-25T19:42:32.625+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] >> Starting >> 2015-06-25T19:42:32.629+0000 o.a.s.z.ZooKeeper [INFO] Initiating client >> connection, connectString=172.31.28.73:2181,172.31.38.251:2181, >> 172.31.38.252:2181 sessionTimeout=20000 >> watcher=org.apache.storm.curator.ConnectionState@5172aa5a >> 2015-06-25T19:42:32.649+0000 o.a.s.z.ClientCnxn [INFO] Opening socket >> connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to >> authenticate using SASL (unknown error) >> 2015-06-25T19:42:32.655+0000 o.a.s.z.ClientCnxn [INFO] Socket connection >> established to 172.31.28.73/172.31.28.73:2181, initiating session >> 2015-06-25T19:42:32.670+0000 o.a.s.z.ClientCnxn [INFO] Session >> establishment complete on server 172.31.28.73/172.31.28.73:2181, >> sessionid = 0x14e2b0caa01005f, negotiated timeout = 20000 >> 2015-06-25T19:42:32.672+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] >> State change: CONNECTED >> 2015-06-25T19:42:32.674+0000 b.s.zookeeper [INFO] Zookeeper state update: >> :connected:none >> 2015-06-25T19:42:32.703+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut >> down >> 2015-06-25T19:42:32.703+0000 o.a.s.z.ZooKeeper [INFO] Session: >> 0x14e2b0caa01005f closed >> 2015-06-25T19:42:32.705+0000 b.s.u.StormBoundedExponentialBackoffRetry >> [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries >> [5] >> 2015-06-25T19:42:32.706+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] >> Starting >> 2015-06-25T19:42:32.716+0000 o.a.s.z.ZooKeeper [INFO] Initiating client >> connection, connectString=172.31.28.73:2181,172.31.38.251:2181, >> 172.31.38.252:2181/storm sessionTimeout=20000 >> watcher=org.apache.storm.curator.ConnectionState@3f308697 >> 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Opening socket >> connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to >> authenticate using SASL (unknown error) >> 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Socket connection >> established to 172.31.28.73/172.31.28.73:2181, initiating session >> 2015-06-25T19:42:32.733+0000 o.a.s.z.ClientCnxn [INFO] Session >> establishment complete on server 172.31.28.73/172.31.28.73:2181, >> sessionid = 0x14e2b0caa010061, negotiated timeout = 20000 >> 2015-06-25T19:42:32.733+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] >> State change: CONNECTED >> 2015-06-25T19:42:32.774+0000 b.s.d.worker [INFO] Reading Assignments. >> 2015-06-25T19:42:32.838+0000 b.s.m.TransportFactory [INFO] Storm peer >> transport plugin:backtype.storm.messaging.netty.Context >> 2015-06-25T19:42:32.971+0000 b.s.d.worker [INFO] Launching receive-thread >> for 58e551ba-f944-4aec-9c8f-5621053021dd:6703 >> 2015-06-25T19:42:32.983+0000 b.s.m.n.Server [INFO] Create Netty Server >> Netty-server-localhost-6703, buffer_size: 10485760, maxWorkers: 4 >> 2015-06-25T19:42:33.011+0000 b.s.m.loader [INFO] Starting receive-thread: >> [stormId: tpch-q5-top-6-1435261345, port: 6703, thread-id: 0 ] >> 2015-06-25T19:42:33.041+0000 b.s.m.n.Client [INFO] creating Netty Client, >> connecting to ip-172-31-19-254.us-west-2.compute.internal:6703, bufferSize: >> 10485760 >> 2015-06-25T19:42:33.041+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] >> maxRetries too large (100). Pinning to 29 >> 2015-06-25T19:42:33.041+0000 b.s.u.StormBoundedExponentialBackoffRetry >> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries >> [100] >> 2015-06-25T19:42:33.042+0000 b.s.m.n.Client [INFO] connection attempt 1 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6703 scheduled to run in 0 ms >> 2015-06-25T19:42:33.067+0000 b.s.m.n.Client [ERROR] connection attempt 1 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6703 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.068+0000 b.s.m.n.Client [INFO] connection attempt 2 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6703 scheduled to run in 103 ms >> 2015-06-25T19:42:33.071+0000 b.s.m.n.Client [INFO] creating Netty Client, >> connecting to ip-172-31-13-184.us-west-2.compute.internal:6703, bufferSize: >> 10485760 >> 2015-06-25T19:42:33.071+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] >> maxRetries too large (100). Pinning to 29 >> 2015-06-25T19:42:33.071+0000 b.s.u.StormBoundedExponentialBackoffRetry >> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries >> [100] >> 2015-06-25T19:42:33.076+0000 b.s.m.n.Client [INFO] connection attempt 1 >> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/ >> 172.31.13.184:6703 scheduled to run in 0 ms >> 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] creating Netty Client, >> connecting to ip-172-31-19-254.us-west-2.compute.internal:6702, bufferSize: >> 10485760 >> 2015-06-25T19:42:33.080+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] >> maxRetries too large (100). Pinning to 29 >> 2015-06-25T19:42:33.080+0000 b.s.u.StormBoundedExponentialBackoffRetry >> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries >> [100] >> 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] connection attempt 1 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6702 scheduled to run in 0 ms >> 2015-06-25T19:42:33.081+0000 b.s.m.n.Client [INFO] creating Netty Client, >> connecting to ip-172-31-13-184.us-west-2.compute.internal:6702, bufferSize: >> 10485760 >> 2015-06-25T19:42:33.082+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] >> maxRetries too large (100). Pinning to 29 >> 2015-06-25T19:42:33.082+0000 b.s.u.StormBoundedExponentialBackoffRetry >> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries >> [100] >> 2015-06-25T19:42:33.082+0000 b.s.m.n.Client [INFO] connection attempt 1 >> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/ >> 172.31.13.184:6702 scheduled to run in 0 ms >> 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] creating Netty Client, >> connecting to ip-172-31-19-254.us-west-2.compute.internal:6701, bufferSize: >> 10485760 >> 2015-06-25T19:42:33.084+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] >> maxRetries too large (100). Pinning to 29 >> 2015-06-25T19:42:33.084+0000 b.s.u.StormBoundedExponentialBackoffRetry >> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries >> [100] >> 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] connection attempt 1 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6701 scheduled to run in 0 ms >> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [ERROR] connection attempt 1 >> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/ >> 172.31.13.184:6703 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] creating Netty Client, >> connecting to ip-172-31-13-184.us-west-2.compute.internal:6701, bufferSize: >> 10485760 >> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] connection attempt 2 >> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/ >> 172.31.13.184:6703 scheduled to run in 103 ms >> 2015-06-25T19:42:33.163+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] >> maxRetries too large (100). Pinning to 29 >> >> and >> >> 2015-06-25T19:42:33.176+0000 b.s.u.StormBoundedExponentialBackoffRetry >> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries >> [100] >> 2015-06-25T19:42:33.176+0000 b.s.m.n.Client [INFO] connection attempt 1 >> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/ >> 172.31.19.253:6700 scheduled to run in 0 ms >> 2015-06-25T19:42:33.178+0000 b.s.m.n.Client [ERROR] connection attempt 1 >> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/ >> 172.31.13.184:6700 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.189+0000 b.s.m.n.Client [ERROR] connection attempt 2 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6703 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.190+0000 b.s.m.n.Client [INFO] connection attempt 2 >> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/ >> 172.31.13.184:6700 scheduled to run in 103 ms >> 2015-06-25T19:42:33.191+0000 b.s.m.n.Client [INFO] connection attempt 3 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6703 scheduled to run in 105 ms >> 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [ERROR] connection attempt 1 >> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/ >> 172.31.19.253:6700 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [INFO] connection attempt 2 >> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/ >> 172.31.19.253:6700 scheduled to run in 102 ms >> 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [ERROR] connection attempt 1 >> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ >> 172.31.19.252:6702 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [INFO] connection attempt 2 >> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ >> 172.31.19.252:6702 scheduled to run in 102 ms >> 2015-06-25T19:42:33.197+0000 b.s.m.n.Client [ERROR] connection attempt 1 >> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ >> 172.31.19.252:6700 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection attempt 2 >> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ >> 172.31.19.252:6700 scheduled to run in 103 ms >> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1 >> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ >> 172.31.19.252:6703 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1 >> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/ >> 172.31.19.253:6702 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection attempt 2 >> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ >> 172.31.19.252:6703 scheduled to run in 103 ms >> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection established >> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ >> 172.31.19.252:6701 >> 2015-06-25T19:42:33.206+0000 b.s.m.n.Client [INFO] connection attempt 2 >> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/ >> 172.31.19.253:6702 scheduled to run in 102 ms >> 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection established >> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/ >> 172.31.19.253:6701 >> 2015-06-25T19:42:33.268+0000 b.s.m.n.Client [ERROR] connection attempt 2 >> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/ >> 172.31.13.184:6703 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.272+0000 b.s.m.n.Client [INFO] connection attempt 3 >> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/ >> 172.31.13.184:6703 scheduled to run in 105 ms >> 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [ERROR] connection attempt 2 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6701 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [INFO] connection attempt 3 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6701 scheduled to run in 105 ms >> 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [ERROR] connection attempt 2 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6702 failed: java.lang.RuntimeException: Returned channel >> was actually not established >> 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [INFO] connection attempt 3 >> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/ >> 172.31.19.254:6702 scheduled to run in 106 ms >> 2015-06-25T19:42:33.275+0000 b.s >> >> Why am I getting the above. Initially, I thought that the input rate of >> tuples in my topology is too high, and Netty's buffers are filled up too >> fast. However, I submitted a debug topology >> that sent one tuple every 1 second and I still got the above messages. >> >> Am I doing something wrong in my configuration? Why do I have the >> previous Netty messages, which obviously show that something is going >> wrong? Please, any hint on my setup will be really helpful. >> >> Regards, >> Nick >> > > -- Nikolaos Romanos Katsipoulakis, University of Pittsburgh, PhD candidate
