Hello, I apologize for forwarding my question again, but it seems to be a general issue in Storm 0.9.4 with Netty (I looked through the user mailing lists yesterday and found similar problems). I mapped my host IPs to aliases in the /etc/hosts file but the problem persists. Is it an open problem (or design decision point) in Storm currently, or can I change something in my configuration to resolve it.
Thank you very much for your time. Regards, Nick ---------- Forwarded message ---------- From: Nick R. Katsipoulakis <[email protected]> Date: 2015-06-25 15:57 GMT-04:00 Subject: Problems with Storm EC2 setup To: [email protected] Hello all, I apologize for the long message, but I have no idea what is going wrong in my setup and I tried to give a lot of info about my cluster. I have the following EC2 setup: 1) 3x m4.xlarge nodes for a 3-node ZooKeeper ensemble and a nimbus 2) 4x m4.xlarge nodes for my Supervisors. All of the machines are running Ubuntu Linux v14, OpenJDK v1.7 and Apache Storm v0.9.4. The storm.yaml I am currently having in the nimbus node (only) has the following values: storm.home: "/opt/apache-storm-0.9.4" storm.local.dir: "/mnt/storm" storm.zookeeper.servers: - "172.31.28.73" - "172.31.38.251" - "172.31.38.252" storm.zookeeper.port: 2181 storm.zookeeper.root: "/storm" storm.zookeeper.session.timeout: 20000 storm.zookeeper.connection.timeout: 15000 storm.zookeeper.retry.times: 5 storm.zookeeper.retry.interval: 1000 storm.zookeeper.retry.invervalceiling.millis: 30000 storm.cluster.mode: "distributed" storm.local.mode.zmq: false storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin" storm.messaging.transport: "backtype.storm.messaging.netty.Context" nimbus.host: "127.0.0.1" nimbus.thrift.port: 6627 nimbus.thrift.max_buffer_size: 1048576 nimbus.thrift.threads: 256 nimbus.childopts: "-Xmx256m" nimbus.task.timeout.secs: 30 nimbus.supervisor.timeout.secs: 60 nimbus.monitor.freq.secs: 10 nimbus.cleanup.inbox.freq.secs: 600 nimbus.inbox.jar.expiration.secs: 3600 nimbus.task.launch.secs: 120 nimbus.reassign: true nimbus.file.copy.expiration.secs: 600 nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator" ui.port: 8080 ui.childopts: "-Xmx768m" logviewer.port: 8000 logviewer.childopts: "-Xmx256m" logviewer.appender.name: "A1" drpc.port: 3772 drpc.worker.threads: 64 drpc.queue.size: 128 drpc.invocations.port: 3773 drpc.request.timeout.secs: 600 drpc.childopts: "-Xmx768m" transactional.zookeeper.root: "/transactional" transactional.zookeeper.servers: null transactional.zookeeper.port: null supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 supervisor.childopts: "-Xmx256m" supervisor.worker.start.timeout.secs: 120 supervisor.worker.timeout.secs: 30 supervisor.monitor.frequency.secs: 3 supervisor.heartbeat.frequency.secs: 5 supervisor.enable: true worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled -Djava.net.preferIPv4Stack=true" worker.heartbeat.frequency.secs: 1 task.heartbeat.frequency.secs: 3 task.refresh.poll.secs: 10 zmq.threads: 1 zmq.linger.millis: 5000 zmq.hwm: 0 storm.messaging.netty.server_worker_threads: 4 storm.messaging.netty.client_worker_threads: 4 storm.messaging.netty.buffer_size: 10485760 storm.messaging.netty.max_retries: 100 storm.messaging.netty.max_wait_ms: 1000 storm.messaging.netty.min_wait_ms: 100 topology.enable.message.timeouts: true topology.debug: false topology.optimize: true topology.workers: 1 topology.acker.executors: null topology.tasks: null topology.message.timeout.secs: 30 topology.skip.missing.kryo.registrations: false topology.max.task.parallelism: null topology.max.spout.pending: null topology.state.synchronization.timeout.secs: 60 topology.stats.sample.rate: 0.05 topology.builtin.metrics.bucket.size.secs: 60 topology.fall.back.on.java.serialization: true topology.worker.childopts: null topology.executor.receive.buffer.size: 1024 topology.executor.send.buffer.size: 1024 topology.receiver.buffer.size: 8 topology.transfer.buffer.size: 1024 topology.tick.tuple.freq.secs: null topology.worker.shared.thread.pool.size: 4 topology.disruptor.wait.strategy: "com.lmax.disruptor.BlockingWaitStrategy" topology.spout.wait.strategy: "backtype.storm.spout.SleepSpoutWaitStrategy" topology.sleep.spout.wait.strategy.time.ms: 1 topology.error.throttle.interval.secs: 10 topology.max.error.report.per.interval: 5 topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory" topology.tuple.serializer: "backtype.storm.serialization.types.ListDelegateSerializer" topology.trident.batch.emit.interval.millis: 500 dev.zookeeper.path: "/tmp/dev-storm-zookeeper" The problem is that every time I submit a topology, I got a lot of Netty messages in my worker logs (found in the supervisor machines) and many of them had similar to the following messages: 2015-06-25T19:42:32.534+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5] 2015-06-25T19:42:32.625+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting 2015-06-25T19:42:32.629+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=172.31.28.73:2181,172.31.38.251:2181, 172.31.38.252:2181 sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@5172aa5a 2015-06-25T19:42:32.649+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to authenticate using SASL (unknown error) 2015-06-25T19:42:32.655+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 172.31.28.73/172.31.28.73:2181, initiating session 2015-06-25T19:42:32.670+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 172.31.28.73/172.31.28.73:2181, sessionid = 0x14e2b0caa01005f, negotiated timeout = 20000 2015-06-25T19:42:32.672+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2015-06-25T19:42:32.674+0000 b.s.zookeeper [INFO] Zookeeper state update: :connected:none 2015-06-25T19:42:32.703+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut down 2015-06-25T19:42:32.703+0000 o.a.s.z.ZooKeeper [INFO] Session: 0x14e2b0caa01005f closed 2015-06-25T19:42:32.705+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5] 2015-06-25T19:42:32.706+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting 2015-06-25T19:42:32.716+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=172.31.28.73:2181,172.31.38.251:2181, 172.31.38.252:2181/storm sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@3f308697 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to authenticate using SASL (unknown error) 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 172.31.28.73/172.31.28.73:2181, initiating session 2015-06-25T19:42:32.733+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 172.31.28.73/172.31.28.73:2181, sessionid = 0x14e2b0caa010061, negotiated timeout = 20000 2015-06-25T19:42:32.733+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2015-06-25T19:42:32.774+0000 b.s.d.worker [INFO] Reading Assignments. 2015-06-25T19:42:32.838+0000 b.s.m.TransportFactory [INFO] Storm peer transport plugin:backtype.storm.messaging.netty.Context 2015-06-25T19:42:32.971+0000 b.s.d.worker [INFO] Launching receive-thread for 58e551ba-f944-4aec-9c8f-5621053021dd:6703 2015-06-25T19:42:32.983+0000 b.s.m.n.Server [INFO] Create Netty Server Netty-server-localhost-6703, buffer_size: 10485760, maxWorkers: 4 2015-06-25T19:42:33.011+0000 b.s.m.loader [INFO] Starting receive-thread: [stormId: tpch-q5-top-6-1435261345, port: 6703, thread-id: 0 ] 2015-06-25T19:42:33.041+0000 b.s.m.n.Client [INFO] creating Netty Client, connecting to ip-172-31-19-254.us-west-2.compute.internal:6703, bufferSize: 10485760 2015-06-25T19:42:33.041+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too large (100). Pinning to 29 2015-06-25T19:42:33.041+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries [100] 2015-06-25T19:42:33.042+0000 b.s.m.n.Client [INFO] connection attempt 1 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 scheduled to run in 0 ms 2015-06-25T19:42:33.067+0000 b.s.m.n.Client [ERROR] connection attempt 1 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.068+0000 b.s.m.n.Client [INFO] connection attempt 2 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 scheduled to run in 103 ms 2015-06-25T19:42:33.071+0000 b.s.m.n.Client [INFO] creating Netty Client, connecting to ip-172-31-13-184.us-west-2.compute.internal:6703, bufferSize: 10485760 2015-06-25T19:42:33.071+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too large (100). Pinning to 29 2015-06-25T19:42:33.071+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries [100] 2015-06-25T19:42:33.076+0000 b.s.m.n.Client [INFO] connection attempt 1 to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 scheduled to run in 0 ms 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] creating Netty Client, connecting to ip-172-31-19-254.us-west-2.compute.internal:6702, bufferSize: 10485760 2015-06-25T19:42:33.080+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too large (100). Pinning to 29 2015-06-25T19:42:33.080+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries [100] 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] connection attempt 1 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702 scheduled to run in 0 ms 2015-06-25T19:42:33.081+0000 b.s.m.n.Client [INFO] creating Netty Client, connecting to ip-172-31-13-184.us-west-2.compute.internal:6702, bufferSize: 10485760 2015-06-25T19:42:33.082+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too large (100). Pinning to 29 2015-06-25T19:42:33.082+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries [100] 2015-06-25T19:42:33.082+0000 b.s.m.n.Client [INFO] connection attempt 1 to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6702 scheduled to run in 0 ms 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] creating Netty Client, connecting to ip-172-31-19-254.us-west-2.compute.internal:6701, bufferSize: 10485760 2015-06-25T19:42:33.084+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too large (100). Pinning to 29 2015-06-25T19:42:33.084+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries [100] 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] connection attempt 1 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701 scheduled to run in 0 ms 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [ERROR] connection attempt 1 to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] creating Netty Client, connecting to ip-172-31-13-184.us-west-2.compute.internal:6701, bufferSize: 10485760 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] connection attempt 2 to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 scheduled to run in 103 ms 2015-06-25T19:42:33.163+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too large (100). Pinning to 29 and 2015-06-25T19:42:33.176+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries [100] 2015-06-25T19:42:33.176+0000 b.s.m.n.Client [INFO] connection attempt 1 to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700 scheduled to run in 0 ms 2015-06-25T19:42:33.178+0000 b.s.m.n.Client [ERROR] connection attempt 1 to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6700 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.189+0000 b.s.m.n.Client [ERROR] connection attempt 2 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.190+0000 b.s.m.n.Client [INFO] connection attempt 2 to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6700 scheduled to run in 103 ms 2015-06-25T19:42:33.191+0000 b.s.m.n.Client [INFO] connection attempt 3 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703 scheduled to run in 105 ms 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [ERROR] connection attempt 1 to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [INFO] connection attempt 2 to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700 scheduled to run in 102 ms 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [ERROR] connection attempt 1 to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6702 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [INFO] connection attempt 2 to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6702 scheduled to run in 102 ms 2015-06-25T19:42:33.197+0000 b.s.m.n.Client [ERROR] connection attempt 1 to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6700 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection attempt 2 to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6700 scheduled to run in 103 ms 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1 to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6703 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1 to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6702 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection attempt 2 to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6703 scheduled to run in 103 ms 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection established to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/ 172.31.19.252:6701 2015-06-25T19:42:33.206+0000 b.s.m.n.Client [INFO] connection attempt 2 to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6702 scheduled to run in 102 ms 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection established to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/ 172.31.19.253:6701 2015-06-25T19:42:33.268+0000 b.s.m.n.Client [ERROR] connection attempt 2 to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.272+0000 b.s.m.n.Client [INFO] connection attempt 3 to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703 scheduled to run in 105 ms 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [ERROR] connection attempt 2 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [INFO] connection attempt 3 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701 scheduled to run in 105 ms 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [ERROR] connection attempt 2 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702 failed: java.lang.RuntimeException: Returned channel was actually not established 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [INFO] connection attempt 3 to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702 scheduled to run in 106 ms 2015-06-25T19:42:33.275+0000 b.s Why am I getting the above. Initially, I thought that the input rate of tuples in my topology is too high, and Netty's buffers are filled up too fast. However, I submitted a debug topology that sent one tuple every 1 second and I still got the above messages. Am I doing something wrong in my configuration? Why do I have the previous Netty messages, which obviously show that something is going wrong? Please, any hint on my setup will be really helpful. Regards, Nick -- Nikolaos Romanos Katsipoulakis, University of Pittsburgh, PhD candidate
