I remember someone post the same issue yesterday. The problem is that your host is somehow resolved as "Netty-Client-*", which is not pingable. You may modify /etc/hosts to map the hostnames to IP addresses appropriately if it is allowed.
— Sincerely, Fan Jiang On Thu, Jun 25, 2015 at 3:57 PM, Nick R. Katsipoulakis <[email protected]> wrote: > Hello all, > I apologize for the long message, but I have no idea what is going wrong in > my setup and I tried to give a lot of info about my cluster. I have the > following EC2 setup: > 1) 3x m4.xlarge nodes for a 3-node ZooKeeper ensemble and a nimbus > 2) 4x m4.xlarge nodes for my Supervisors. > All of the machines are running Ubuntu Linux v14, OpenJDK v1.7 and Apache > Storm v0.9.4. The storm.yaml I am currently having in the nimbus node > (only) has the following values: > storm.home: "/opt/apache-storm-0.9.4" > storm.local.dir: "/mnt/storm" > storm.zookeeper.servers: > - "172.31.28.73" > - "172.31.38.251" > - "172.31.38.252" > storm.zookeeper.port: 2181 > storm.zookeeper.root: "/storm" > storm.zookeeper.session.timeout: 20000 > storm.zookeeper.connection.timeout: 15000 > storm.zookeeper.retry.times: 5 > storm.zookeeper.retry.interval: 1000 > storm.zookeeper.retry.invervalceiling.millis: 30000 > storm.cluster.mode: "distributed" > storm.local.mode.zmq: false > storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin" > storm.messaging.transport: "backtype.storm.messaging.netty.Context" > nimbus.host: "127.0.0.1" > nimbus.thrift.port: 6627 > nimbus.thrift.max_buffer_size: 1048576 > nimbus.thrift.threads: 256 > nimbus.childopts: "-Xmx256m" > nimbus.task.timeout.secs: 30 > nimbus.supervisor.timeout.secs: 60 > nimbus.monitor.freq.secs: 10 > nimbus.cleanup.inbox.freq.secs: 600 > nimbus.inbox.jar.expiration.secs: 3600 > nimbus.task.launch.secs: 120 > nimbus.reassign: true > nimbus.file.copy.expiration.secs: 600 > nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator" > ui.port: 8080 > ui.childopts: "-Xmx768m" > logviewer.port: 8000 > logviewer.childopts: "-Xmx256m" > logviewer.appender.name: "A1" > drpc.port: 3772 > drpc.worker.threads: 64 > drpc.queue.size: 128 > drpc.invocations.port: 3773 > drpc.request.timeout.secs: 600 > drpc.childopts: "-Xmx768m" > transactional.zookeeper.root: "/transactional" > transactional.zookeeper.servers: null > transactional.zookeeper.port: null > supervisor.slots.ports: > - 6700 > - 6701 > - 6702 > - 6703 > supervisor.childopts: "-Xmx256m" > supervisor.worker.start.timeout.secs: 120 > supervisor.worker.timeout.secs: 30 > supervisor.monitor.frequency.secs: 3 > supervisor.heartbeat.frequency.secs: 5 > supervisor.enable: true > worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:NewSize=128m > -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled > -Djava.net.preferIPv4Stack=true" > worker.heartbeat.frequency.secs: 1 > task.heartbeat.frequency.secs: 3 > task.refresh.poll.secs: 10 > zmq.threads: 1 > zmq.linger.millis: 5000 > zmq.hwm: 0 > storm.messaging.netty.server_worker_threads: 4 > storm.messaging.netty.client_worker_threads: 4 > storm.messaging.netty.buffer_size: 10485760 > storm.messaging.netty.max_retries: 100 > storm.messaging.netty.max_wait_ms: 1000 > storm.messaging.netty.min_wait_ms: 100 > topology.enable.message.timeouts: true > topology.debug: false > topology.optimize: true > topology.workers: 1 > topology.acker.executors: null > topology.tasks: null > topology.message.timeout.secs: 30 > topology.skip.missing.kryo.registrations: false > topology.max.task.parallelism: null > topology.max.spout.pending: null > topology.state.synchronization.timeout.secs: 60 > topology.stats.sample.rate: 0.05 > topology.builtin.metrics.bucket.size.secs: 60 > topology.fall.back.on.java.serialization: true > topology.worker.childopts: null > topology.executor.receive.buffer.size: 1024 > topology.executor.send.buffer.size: 1024 > topology.receiver.buffer.size: 8 > topology.transfer.buffer.size: 1024 > topology.tick.tuple.freq.secs: null > topology.worker.shared.thread.pool.size: 4 > topology.disruptor.wait.strategy: "com.lmax.disruptor.BlockingWaitStrategy" > topology.spout.wait.strategy: "backtype.storm.spout.SleepSpoutWaitStrategy" > topology.sleep.spout.wait.strategy.time.ms: 1 > topology.error.throttle.interval.secs: 10 > topology.max.error.report.per.interval: 5 > topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory" > topology.tuple.serializer: > "backtype.storm.serialization.types.ListDelegateSerializer" > topology.trident.batch.emit.interval.millis: 500 > dev.zookeeper.path: "/tmp/dev-storm-zookeeper" > The problem is that every time I submit a topology, I got a lot of Netty > messages in my worker logs (found in the supervisor machines) and > many of them had similar to the following messages: > 2015-06-25T19:42:32.534+0000 b.s.u.StormBoundedExponentialBackoffRetry > [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries > [5] > 2015-06-25T19:42:32.625+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] > Starting > 2015-06-25T19:42:32.629+0000 o.a.s.z.ZooKeeper [INFO] Initiating client > connection, connectString=172.31.28.73:2181,172.31.38.251:2181, > 172.31.38.252:2181 sessionTimeout=20000 > watcher=org.apache.storm.curator.ConnectionState@5172aa5a > 2015-06-25T19:42:32.649+0000 o.a.s.z.ClientCnxn [INFO] Opening socket > connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-25T19:42:32.655+0000 o.a.s.z.ClientCnxn [INFO] Socket connection > established to 172.31.28.73/172.31.28.73:2181, initiating session > 2015-06-25T19:42:32.670+0000 o.a.s.z.ClientCnxn [INFO] Session > establishment complete on server 172.31.28.73/172.31.28.73:2181, sessionid > = 0x14e2b0caa01005f, negotiated timeout = 20000 > 2015-06-25T19:42:32.672+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] > State change: CONNECTED > 2015-06-25T19:42:32.674+0000 b.s.zookeeper [INFO] Zookeeper state update: > :connected:none > 2015-06-25T19:42:32.703+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut down > 2015-06-25T19:42:32.703+0000 o.a.s.z.ZooKeeper [INFO] Session: > 0x14e2b0caa01005f closed > 2015-06-25T19:42:32.705+0000 b.s.u.StormBoundedExponentialBackoffRetry > [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries > [5] > 2015-06-25T19:42:32.706+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] > Starting > 2015-06-25T19:42:32.716+0000 o.a.s.z.ZooKeeper [INFO] Initiating client > connection, connectString=172.31.28.73:2181,172.31.38.251:2181, > 172.31.38.252:2181/storm sessionTimeout=20000 > watcher=org.apache.storm.curator.ConnectionState@3f308697 > 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Opening socket > connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Socket connection > established to 172.31.28.73/172.31.28.73:2181, initiating session > 2015-06-25T19:42:32.733+0000 o.a.s.z.ClientCnxn [INFO] Session > establishment complete on server 172.31.28.73/172.31.28.73:2181, sessionid > = 0x14e2b0caa010061, negotiated timeout = 20000 > 2015-06-25T19:42:32.733+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] > State change: CONNECTED > 2015-06-25T19:42:32.774+0000 b.s.d.worker [INFO] Reading Assignments. > 2015-06-25T19:42:32.838+0000 b.s.m.TransportFactory [INFO] Storm peer > transport plugin:backtype.storm.messaging.netty.Context > 2015-06-25T19:42:32.971+0000 b.s.d.worker [INFO] Launching receive-thread > for 58e551ba-f944-4aec-9c8f-5621053021dd:6703 > 2015-06-25T19:42:32.983+0000 b.s.m.n.Server [INFO] Create Netty Server > Netty-server-localhost-6703, buffer_size: 10485760, maxWorkers: 4 > 2015-06-25T19:42:33.011+0000 b.s.m.loader [INFO] Starting receive-thread: > [stormId: tpch-q5-top-6-1435261345, port: 6703, thread-id: 0 ] > 2015-06-25T19:42:33.041+0000 b.s.m.n.Client [INFO] creating Netty Client, > connecting to ip-172-31-19-254.us-west-2.compute.internal:6703, bufferSize: > 10485760 > 2015-06-25T19:42:33.041+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] > maxRetries too large (100). Pinning to 29 > 2015-06-25T19:42:33.041+0000 b.s.u.StormBoundedExponentialBackoffRetry > [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries > [100] > 2015-06-25T19:42:33.042+0000 b.s.m.n.Client [INFO] connection attempt 1 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 > scheduled to run in 0 ms > 2015-06-25T19:42:33.067+0000 b.s.m.n.Client [ERROR] connection attempt 1 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.068+0000 b.s.m.n.Client [INFO] connection attempt 2 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 > scheduled to run in 103 ms > 2015-06-25T19:42:33.071+0000 b.s.m.n.Client [INFO] creating Netty Client, > connecting to ip-172-31-13-184.us-west-2.compute.internal:6703, bufferSize: > 10485760 > 2015-06-25T19:42:33.071+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] > maxRetries too large (100). Pinning to 29 > 2015-06-25T19:42:33.071+0000 b.s.u.StormBoundedExponentialBackoffRetry > [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries > [100] > 2015-06-25T19:42:33.076+0000 b.s.m.n.Client [INFO] connection attempt 1 to > Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 > scheduled to run in 0 ms > 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] creating Netty Client, > connecting to ip-172-31-19-254.us-west-2.compute.internal:6702, bufferSize: > 10485760 > 2015-06-25T19:42:33.080+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] > maxRetries too large (100). Pinning to 29 > 2015-06-25T19:42:33.080+0000 b.s.u.StormBoundedExponentialBackoffRetry > [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries > [100] > 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] connection attempt 1 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702 > scheduled to run in 0 ms > 2015-06-25T19:42:33.081+0000 b.s.m.n.Client [INFO] creating Netty Client, > connecting to ip-172-31-13-184.us-west-2.compute.internal:6702, bufferSize: > 10485760 > 2015-06-25T19:42:33.082+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] > maxRetries too large (100). Pinning to 29 > 2015-06-25T19:42:33.082+0000 b.s.u.StormBoundedExponentialBackoffRetry > [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries > [100] > 2015-06-25T19:42:33.082+0000 b.s.m.n.Client [INFO] connection attempt 1 to > Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6702 > scheduled to run in 0 ms > 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] creating Netty Client, > connecting to ip-172-31-19-254.us-west-2.compute.internal:6701, bufferSize: > 10485760 > 2015-06-25T19:42:33.084+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] > maxRetries too large (100). Pinning to 29 > 2015-06-25T19:42:33.084+0000 b.s.u.StormBoundedExponentialBackoffRetry > [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries > [100] > 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] connection attempt 1 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701 > scheduled to run in 0 ms > 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [ERROR] connection attempt 1 to > Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] creating Netty Client, > connecting to ip-172-31-13-184.us-west-2.compute.internal:6701, bufferSize: > 10485760 > 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] connection attempt 2 to > Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 > scheduled to run in 103 ms > 2015-06-25T19:42:33.163+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] > maxRetries too large (100). Pinning to 29 > and > 2015-06-25T19:42:33.176+0000 b.s.u.StormBoundedExponentialBackoffRetry > [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries > [100] > 2015-06-25T19:42:33.176+0000 b.s.m.n.Client [INFO] connection attempt 1 to > Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700 > scheduled to run in 0 ms > 2015-06-25T19:42:33.178+0000 b.s.m.n.Client [ERROR] connection attempt 1 to > Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6700 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.189+0000 b.s.m.n.Client [ERROR] connection attempt 2 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.190+0000 b.s.m.n.Client [INFO] connection attempt 2 to > Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6700 > scheduled to run in 103 ms > 2015-06-25T19:42:33.191+0000 b.s.m.n.Client [INFO] connection attempt 3 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 > scheduled to run in 105 ms > 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [ERROR] connection attempt 1 to > Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [INFO] connection attempt 2 to > Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700 > scheduled to run in 102 ms > 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [ERROR] connection attempt 1 to > Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6702 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [INFO] connection attempt 2 to > Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6702 > scheduled to run in 102 ms > 2015-06-25T19:42:33.197+0000 b.s.m.n.Client [ERROR] connection attempt 1 to > Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6700 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection attempt 2 to > Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6700 > scheduled to run in 103 ms > 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1 to > Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6703 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1 to > Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6702 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection attempt 2 to > Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6703 > scheduled to run in 103 ms > 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection established > to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ > 172.31.19.252:6701 > 2015-06-25T19:42:33.206+0000 b.s.m.n.Client [INFO] connection attempt 2 to > Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6702 > scheduled to run in 102 ms > 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection established > to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/ > 172.31.19.253:6701 > 2015-06-25T19:42:33.268+0000 b.s.m.n.Client [ERROR] connection attempt 2 to > Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.272+0000 b.s.m.n.Client [INFO] connection attempt 3 to > Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 > scheduled to run in 105 ms > 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [ERROR] connection attempt 2 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [INFO] connection attempt 3 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701 > scheduled to run in 105 ms > 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [ERROR] connection attempt 2 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702 > failed: java.lang.RuntimeException: Returned channel was actually not > established > 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [INFO] connection attempt 3 to > Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702 > scheduled to run in 106 ms > 2015-06-25T19:42:33.275+0000 b.s > Why am I getting the above. Initially, I thought that the input rate of > tuples in my topology is too high, and Netty's buffers are filled up too > fast. However, I submitted a debug topology > that sent one tuple every 1 second and I still got the above messages. > Am I doing something wrong in my configuration? Why do I have the previous > Netty messages, which obviously show that something is going wrong? Please, > any hint on my setup will be really helpful. > Regards, > Nick
