Thank you Fan for your response. Can you please give me some more
information on how to do what you suggest?

Regards,
Nick

2015-06-25 16:14 GMT-04:00 Fan Jiang <[email protected]>:

> I remember someone post the same issue yesterday. The problem is that your
> host is somehow resolved as "Netty-Client-*", which is not pingable. You
> may modify /etc/hosts to map the hostnames to IP addresses appropriately if
> it is allowed.
>
> —
> Sincerely,
> Fan Jiang
>
>
> On Thu, Jun 25, 2015 at 3:57 PM, Nick R. Katsipoulakis <
> [email protected]> wrote:
>
>>  Hello all,
>>
>> I apologize for the long message, but I have no idea what is going wrong
>> in my setup and I tried to give a lot of info about my cluster. I have the
>> following EC2 setup:
>>
>> 1) 3x m4.xlarge nodes for a 3-node ZooKeeper ensemble and a nimbus
>>
>> 2) 4x m4.xlarge nodes for my Supervisors.
>>
>> All of the machines are running Ubuntu Linux v14, OpenJDK v1.7 and Apache
>> Storm v0.9.4. The storm.yaml I am currently having in the nimbus node
>> (only) has the following values:
>>
>> storm.home: "/opt/apache-storm-0.9.4"
>> storm.local.dir: "/mnt/storm"
>> storm.zookeeper.servers:
>>      - "172.31.28.73"
>>      - "172.31.38.251"
>>      - "172.31.38.252"
>> storm.zookeeper.port: 2181
>> storm.zookeeper.root: "/storm"
>> storm.zookeeper.session.timeout: 20000
>> storm.zookeeper.connection.timeout: 15000
>> storm.zookeeper.retry.times: 5
>> storm.zookeeper.retry.interval: 1000
>> storm.zookeeper.retry.invervalceiling.millis: 30000
>> storm.cluster.mode: "distributed"
>> storm.local.mode.zmq: false
>> storm.thrift.transport:
>> "backtype.storm.security.auth.SimpleTransportPlugin"
>> storm.messaging.transport: "backtype.storm.messaging.netty.Context"
>>
>> nimbus.host: "127.0.0.1"
>> nimbus.thrift.port: 6627
>> nimbus.thrift.max_buffer_size: 1048576
>> nimbus.thrift.threads: 256
>> nimbus.childopts: "-Xmx256m"
>> nimbus.task.timeout.secs: 30
>> nimbus.supervisor.timeout.secs: 60
>> nimbus.monitor.freq.secs: 10
>> nimbus.cleanup.inbox.freq.secs: 600
>> nimbus.inbox.jar.expiration.secs: 3600
>> nimbus.task.launch.secs: 120
>> nimbus.reassign: true
>> nimbus.file.copy.expiration.secs: 600
>> nimbus.topology.validator:
>> "backtype.storm.nimbus.DefaultTopologyValidator"
>>
>> ui.port: 8080
>> ui.childopts: "-Xmx768m"
>> logviewer.port: 8000
>> logviewer.childopts: "-Xmx256m"
>>  logviewer.appender.name: "A1"
>>
>> drpc.port: 3772
>> drpc.worker.threads: 64
>> drpc.queue.size: 128
>> drpc.invocations.port: 3773
>> drpc.request.timeout.secs: 600
>> drpc.childopts: "-Xmx768m"
>>
>> transactional.zookeeper.root: "/transactional"
>> transactional.zookeeper.servers: null
>> transactional.zookeeper.port: null
>>
>> supervisor.slots.ports:
>>      - 6700
>>      - 6701
>>      - 6702
>>      - 6703
>> supervisor.childopts: "-Xmx256m"
>> supervisor.worker.start.timeout.secs: 120
>> supervisor.worker.timeout.secs: 30
>> supervisor.monitor.frequency.secs: 3
>> supervisor.heartbeat.frequency.secs: 5
>> supervisor.enable: true
>>
>> worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:NewSize=128m
>> -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled
>> -Djava.net.preferIPv4Stack=true"
>> worker.heartbeat.frequency.secs: 1
>>
>> task.heartbeat.frequency.secs: 3
>> task.refresh.poll.secs: 10
>>
>> zmq.threads: 1
>> zmq.linger.millis: 5000
>> zmq.hwm: 0
>>
>> storm.messaging.netty.server_worker_threads: 4
>> storm.messaging.netty.client_worker_threads: 4
>> storm.messaging.netty.buffer_size: 10485760
>> storm.messaging.netty.max_retries: 100
>> storm.messaging.netty.max_wait_ms: 1000
>> storm.messaging.netty.min_wait_ms: 100
>> topology.enable.message.timeouts: true
>> topology.debug: false
>> topology.optimize: true
>> topology.workers: 1
>> topology.acker.executors: null
>> topology.tasks: null
>> topology.message.timeout.secs: 30
>> topology.skip.missing.kryo.registrations: false
>> topology.max.task.parallelism: null
>> topology.max.spout.pending: null
>> topology.state.synchronization.timeout.secs: 60
>> topology.stats.sample.rate: 0.05
>> topology.builtin.metrics.bucket.size.secs: 60
>> topology.fall.back.on.java.serialization: true
>> topology.worker.childopts: null
>> topology.executor.receive.buffer.size: 1024
>> topology.executor.send.buffer.size: 1024
>> topology.receiver.buffer.size: 8
>> topology.transfer.buffer.size: 1024
>> topology.tick.tuple.freq.secs: null
>> topology.worker.shared.thread.pool.size: 4
>> topology.disruptor.wait.strategy:
>> "com.lmax.disruptor.BlockingWaitStrategy"
>> topology.spout.wait.strategy:
>> "backtype.storm.spout.SleepSpoutWaitStrategy"
>>  topology.sleep.spout.wait.strategy.time.ms: 1
>> topology.error.throttle.interval.secs: 10
>> topology.max.error.report.per.interval: 5
>> topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory"
>> topology.tuple.serializer:
>> "backtype.storm.serialization.types.ListDelegateSerializer"
>> topology.trident.batch.emit.interval.millis: 500
>>
>> dev.zookeeper.path: "/tmp/dev-storm-zookeeper"
>>
>> The problem is that every time I submit a topology, I got a lot of Netty
>> messages in my worker logs (found in the supervisor machines) and
>> many of them had similar to the following messages:
>>
>> 2015-06-25T19:42:32.534+0000 b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries
>> [5]
>> 2015-06-25T19:42:32.625+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO]
>> Starting
>> 2015-06-25T19:42:32.629+0000 o.a.s.z.ZooKeeper [INFO] Initiating client
>> connection, connectString=172.31.28.73:2181,172.31.38.251:2181,
>> 172.31.38.252:2181 sessionTimeout=20000
>> watcher=org.apache.storm.curator.ConnectionState@5172aa5a
>> 2015-06-25T19:42:32.649+0000 o.a.s.z.ClientCnxn [INFO] Opening socket
>> connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>> 2015-06-25T19:42:32.655+0000 o.a.s.z.ClientCnxn [INFO] Socket connection
>> established to 172.31.28.73/172.31.28.73:2181, initiating session
>> 2015-06-25T19:42:32.670+0000 o.a.s.z.ClientCnxn [INFO] Session
>> establishment complete on server 172.31.28.73/172.31.28.73:2181,
>> sessionid = 0x14e2b0caa01005f, negotiated timeout = 20000
>> 2015-06-25T19:42:32.672+0000 o.a.s.c.f.s.ConnectionStateManager [INFO]
>> State change: CONNECTED
>> 2015-06-25T19:42:32.674+0000 b.s.zookeeper [INFO] Zookeeper state update:
>> :connected:none
>> 2015-06-25T19:42:32.703+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut
>> down
>> 2015-06-25T19:42:32.703+0000 o.a.s.z.ZooKeeper [INFO] Session:
>> 0x14e2b0caa01005f closed
>> 2015-06-25T19:42:32.705+0000 b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries
>> [5]
>> 2015-06-25T19:42:32.706+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO]
>> Starting
>> 2015-06-25T19:42:32.716+0000 o.a.s.z.ZooKeeper [INFO] Initiating client
>> connection, connectString=172.31.28.73:2181,172.31.38.251:2181,
>> 172.31.38.252:2181/storm sessionTimeout=20000
>> watcher=org.apache.storm.curator.ConnectionState@3f308697
>> 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Opening socket
>> connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>> 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Socket connection
>> established to 172.31.28.73/172.31.28.73:2181, initiating session
>> 2015-06-25T19:42:32.733+0000 o.a.s.z.ClientCnxn [INFO] Session
>> establishment complete on server 172.31.28.73/172.31.28.73:2181,
>> sessionid = 0x14e2b0caa010061, negotiated timeout = 20000
>> 2015-06-25T19:42:32.733+0000 o.a.s.c.f.s.ConnectionStateManager [INFO]
>> State change: CONNECTED
>> 2015-06-25T19:42:32.774+0000 b.s.d.worker [INFO] Reading Assignments.
>> 2015-06-25T19:42:32.838+0000 b.s.m.TransportFactory [INFO] Storm peer
>> transport plugin:backtype.storm.messaging.netty.Context
>> 2015-06-25T19:42:32.971+0000 b.s.d.worker [INFO] Launching receive-thread
>> for 58e551ba-f944-4aec-9c8f-5621053021dd:6703
>> 2015-06-25T19:42:32.983+0000 b.s.m.n.Server [INFO] Create Netty Server
>> Netty-server-localhost-6703, buffer_size: 10485760, maxWorkers: 4
>> 2015-06-25T19:42:33.011+0000 b.s.m.loader [INFO] Starting receive-thread:
>> [stormId: tpch-q5-top-6-1435261345, port: 6703, thread-id: 0 ]
>> 2015-06-25T19:42:33.041+0000 b.s.m.n.Client [INFO] creating Netty Client,
>> connecting to ip-172-31-19-254.us-west-2.compute.internal:6703, bufferSize:
>> 10485760
>> 2015-06-25T19:42:33.041+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
>> maxRetries too large (100). Pinning to 29
>> 2015-06-25T19:42:33.041+0000 b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
>> [100]
>> 2015-06-25T19:42:33.042+0000 b.s.m.n.Client [INFO] connection attempt 1
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6703 scheduled to run in 0 ms
>> 2015-06-25T19:42:33.067+0000 b.s.m.n.Client [ERROR] connection attempt 1
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6703 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.068+0000 b.s.m.n.Client [INFO] connection attempt 2
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6703 scheduled to run in 103 ms
>> 2015-06-25T19:42:33.071+0000 b.s.m.n.Client [INFO] creating Netty Client,
>> connecting to ip-172-31-13-184.us-west-2.compute.internal:6703, bufferSize:
>> 10485760
>> 2015-06-25T19:42:33.071+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
>> maxRetries too large (100). Pinning to 29
>> 2015-06-25T19:42:33.071+0000 b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
>> [100]
>> 2015-06-25T19:42:33.076+0000 b.s.m.n.Client [INFO] connection attempt 1
>> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
>> 172.31.13.184:6703 scheduled to run in 0 ms
>> 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] creating Netty Client,
>> connecting to ip-172-31-19-254.us-west-2.compute.internal:6702, bufferSize:
>> 10485760
>> 2015-06-25T19:42:33.080+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
>> maxRetries too large (100). Pinning to 29
>> 2015-06-25T19:42:33.080+0000 b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
>> [100]
>> 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] connection attempt 1
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6702 scheduled to run in 0 ms
>> 2015-06-25T19:42:33.081+0000 b.s.m.n.Client [INFO] creating Netty Client,
>> connecting to ip-172-31-13-184.us-west-2.compute.internal:6702, bufferSize:
>> 10485760
>> 2015-06-25T19:42:33.082+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
>> maxRetries too large (100). Pinning to 29
>> 2015-06-25T19:42:33.082+0000 b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
>> [100]
>> 2015-06-25T19:42:33.082+0000 b.s.m.n.Client [INFO] connection attempt 1
>> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
>> 172.31.13.184:6702 scheduled to run in 0 ms
>> 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] creating Netty Client,
>> connecting to ip-172-31-19-254.us-west-2.compute.internal:6701, bufferSize:
>> 10485760
>> 2015-06-25T19:42:33.084+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
>> maxRetries too large (100). Pinning to 29
>> 2015-06-25T19:42:33.084+0000 b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
>> [100]
>> 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] connection attempt 1
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6701 scheduled to run in 0 ms
>> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [ERROR] connection attempt 1
>> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
>> 172.31.13.184:6703 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] creating Netty Client,
>> connecting to ip-172-31-13-184.us-west-2.compute.internal:6701, bufferSize:
>> 10485760
>> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] connection attempt 2
>> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
>> 172.31.13.184:6703 scheduled to run in 103 ms
>> 2015-06-25T19:42:33.163+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
>> maxRetries too large (100). Pinning to 29
>>
>> and
>>
>> 2015-06-25T19:42:33.176+0000 b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
>> [100]
>> 2015-06-25T19:42:33.176+0000 b.s.m.n.Client [INFO] connection attempt 1
>> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
>> 172.31.19.253:6700 scheduled to run in 0 ms
>> 2015-06-25T19:42:33.178+0000 b.s.m.n.Client [ERROR] connection attempt 1
>> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
>> 172.31.13.184:6700 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.189+0000 b.s.m.n.Client [ERROR] connection attempt 2
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6703 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.190+0000 b.s.m.n.Client [INFO] connection attempt 2
>> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
>> 172.31.13.184:6700 scheduled to run in 103 ms
>> 2015-06-25T19:42:33.191+0000 b.s.m.n.Client [INFO] connection attempt 3
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6703 scheduled to run in 105 ms
>> 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [ERROR] connection attempt 1
>> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
>> 172.31.19.253:6700 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [INFO] connection attempt 2
>> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
>> 172.31.19.253:6700 scheduled to run in 102 ms
>> 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [ERROR] connection attempt 1
>> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
>> 172.31.19.252:6702 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [INFO] connection attempt 2
>> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
>> 172.31.19.252:6702 scheduled to run in 102 ms
>> 2015-06-25T19:42:33.197+0000 b.s.m.n.Client [ERROR] connection attempt 1
>> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
>> 172.31.19.252:6700 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection attempt 2
>> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
>> 172.31.19.252:6700 scheduled to run in 103 ms
>> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1
>> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
>> 172.31.19.252:6703 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1
>> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
>> 172.31.19.253:6702 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection attempt 2
>> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
>> 172.31.19.252:6703 scheduled to run in 103 ms
>> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection established
>> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
>> 172.31.19.252:6701
>> 2015-06-25T19:42:33.206+0000 b.s.m.n.Client [INFO] connection attempt 2
>> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
>> 172.31.19.253:6702 scheduled to run in 102 ms
>> 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection established
>> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
>> 172.31.19.253:6701
>> 2015-06-25T19:42:33.268+0000 b.s.m.n.Client [ERROR] connection attempt 2
>> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
>> 172.31.13.184:6703 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.272+0000 b.s.m.n.Client [INFO] connection attempt 3
>> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
>> 172.31.13.184:6703 scheduled to run in 105 ms
>> 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [ERROR] connection attempt 2
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6701 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [INFO] connection attempt 3
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6701 scheduled to run in 105 ms
>> 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [ERROR] connection attempt 2
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6702 failed: java.lang.RuntimeException: Returned channel
>> was actually not established
>> 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [INFO] connection attempt 3
>> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
>> 172.31.19.254:6702 scheduled to run in 106 ms
>> 2015-06-25T19:42:33.275+0000 b.s
>>
>> Why am I getting the above. Initially, I thought that the input rate of
>> tuples in my topology is too high, and Netty's buffers are filled up too
>> fast. However, I submitted a debug topology
>> that sent one tuple every 1 second and I still got the above messages.
>>
>> Am I doing something wrong in my configuration? Why do I have the
>> previous Netty messages, which obviously show that something is going
>> wrong? Please, any hint on my setup will be really helpful.
>>
>> Regards,
>> Nick
>>
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Reply via email to