[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range
[ https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625931#comment-14625931 ] ASF GitHub Bot commented on STORM-643: -- Github user alexsobrino commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121146615 +1 KafkaUtils repeatedly fetches messages whose offset is out of range --- Key: STORM-643 URL: https://issues.apache.org/jira/browse/STORM-643 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5 Reporter: Xin Wang Assignee: Xin Wang Priority: Minor KafkaUtils repeat fetch messages which offset is out of range. This happened when failed list(SortedSetLong failed) is not empty and some offset in it is OutOfRange. {code} [worker-log] 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 ... {code} [FIX] {code} storm.kafka.PartitionManager.fill(): ... try { msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, offset); } catch (UpdateOffsetException e) { _emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, _partition.partition, _spoutConfig); LOG.warn(Using new offset: {}, _emittedToOffset); // fetch failed, so don't update the metrics //fix bug: remove this offset from failed list when it is OutOfRange if (had_failed) { failed.remove(offset); } return; } ... {code} also: Log retrying with default start offset time from configuration. configured start offset time: [-2] is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-935] Update Disruptor queue version to ...
Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/630#issuecomment-121150967 @errordaiwa @amontalenti 1000ms timeout makes sense to me. Actually 100ms timeout also makes sense to me, but I'd like to know our opinions about load aware. We're having a option to set timeout now, so it would be no issue. I'll run some performance tests and check no tuples are failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-935) Update Disruptor queue version to 2.10.4
[ https://issues.apache.org/jira/browse/STORM-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625967#comment-14625967 ] ASF GitHub Bot commented on STORM-935: -- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/630#issuecomment-121150967 @errordaiwa @amontalenti 1000ms timeout makes sense to me. Actually 100ms timeout also makes sense to me, but I'd like to know our opinions about load aware. We're having a option to set timeout now, so it would be no issue. I'll run some performance tests and check no tuples are failed. Update Disruptor queue version to 2.10.4 Key: STORM-935 URL: https://issues.apache.org/jira/browse/STORM-935 Project: Apache Storm Issue Type: Dependency upgrade Affects Versions: 0.11.0 Reporter: Xingyu Su Storm now still use an old version of Disruptor queue(ver 2.10.1). This version has some potential race problems. Version 2.10.4 has fixed these bugs. https://issues.apache.org/jira/browse/STORM-503 will benifit from this update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...
Github user alexsobrino commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121146615 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...
Github user tedxia commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121188995 In branch 0.10.0, the failed tuple has been managered by ExponentialBackoffMsgRetryManager, should we also make this change to 0.10.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...
Github user mvalleavila commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121189676 We are reproducing the issue too, +1 Thx! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range
[ https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626140#comment-14626140 ] ASF GitHub Bot commented on STORM-643: -- Github user tedxia commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121188995 In branch 0.10.0, the failed tuple has been managered by ExponentialBackoffMsgRetryManager, should we also make this change to 0.10.0 KafkaUtils repeatedly fetches messages whose offset is out of range --- Key: STORM-643 URL: https://issues.apache.org/jira/browse/STORM-643 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5 Reporter: Xin Wang Assignee: Xin Wang Priority: Minor KafkaUtils repeat fetch messages which offset is out of range. This happened when failed list(SortedSetLong failed) is not empty and some offset in it is OutOfRange. {code} [worker-log] 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 ... {code} [FIX] {code} storm.kafka.PartitionManager.fill(): ... try { msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, offset); } catch (UpdateOffsetException e) { _emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, _partition.partition, _spoutConfig); LOG.warn(Using new offset: {}, _emittedToOffset); // fetch failed, so don't update the metrics //fix bug: remove this offset from failed list when it is OutOfRange if (had_failed) { failed.remove(offset); } return; } ... {code} also: Log retrying with default start offset time from configuration. configured start offset time: [-2] is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...
Github user ellull commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121180650 We are also facing this issue, so +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range
[ https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626112#comment-14626112 ] ASF GitHub Bot commented on STORM-643: -- Github user ellull commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121180650 We are also facing this issue, so +1 KafkaUtils repeatedly fetches messages whose offset is out of range --- Key: STORM-643 URL: https://issues.apache.org/jira/browse/STORM-643 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5 Reporter: Xin Wang Assignee: Xin Wang Priority: Minor KafkaUtils repeat fetch messages which offset is out of range. This happened when failed list(SortedSetLong failed) is not empty and some offset in it is OutOfRange. {code} [worker-log] 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 ... {code} [FIX] {code} storm.kafka.PartitionManager.fill(): ... try { msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, offset); } catch (UpdateOffsetException e) { _emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, _partition.partition, _spoutConfig); LOG.warn(Using new offset: {}, _emittedToOffset); // fetch failed, so don't update the metrics //fix bug: remove this offset from failed list when it is OutOfRange if (had_failed) { failed.remove(offset); } return; } ... {code} also: Log retrying with default start offset time from configuration. configured start offset time: [-2] is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-67 Provide API for spouts to know how ma...
Github user bourneagain commented on the pull request: https://github.com/apache/storm/pull/593#issuecomment-121281726 Thanks @HeartSaVioR .We can have this merged to master whenever we feel appropriate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-67) Provide API for spouts to know how many pending messages there are
[ https://issues.apache.org/jira/browse/STORM-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626476#comment-14626476 ] ASF GitHub Bot commented on STORM-67: - Github user bourneagain commented on the pull request: https://github.com/apache/storm/pull/593#issuecomment-121281726 Thanks @HeartSaVioR .We can have this merged to master whenever we feel appropriate. Provide API for spouts to know how many pending messages there are -- Key: STORM-67 URL: https://issues.apache.org/jira/browse/STORM-67 Project: Apache Storm Issue Type: New Feature Reporter: James Xu Assignee: Shyam Rajendran Labels: newbie https://github.com/nathanmarz/storm/issues/343 This would be useful in case you want to take special action in the spout like drop messages - Discmt: Hi, I'd like to try and take a crack at this if it's still relevant. I'm not exactly sure what it's asking for though. It seems to me an implementation for knowing how many pending messages there are for a spout depends on where the spout is getting it's information from, which makes me sure I'm missing something. - revans2: The spout code in backtype/storm/daemon/executor.clj is already keeping track of the pending tuples if acking is enabled. If acking is disabled nothing is pending. defmethod mk-threads :spout [executor-data task-datas] defines pending as a RotatingMap which maps all of the storm internal tuple ids to the message id objects passed in by the spout when it first emitted the tuple. The hardest part should be getting pending to a place where the ISpoutOutputCollector implementation or where ever the API is, can get access to it. - ptgoetz: @Discmt Yes, this is still relevant and would be nice to have. The Storm framework asks spouts for tuples by calling the nextTuple() method and keeps track of the tuple tree internally. The underlying data source does not come into play. As implied by @revans2, one approach would be to add a method to ISpoutOutputCollector such as getPendingCount() that would allow spout implementations to query for the pending count (probably returning -1 if acking is disabled). The tricky part will likely be bridging the gap between executor.clj and the ISpoutOutputCollector implementation(s). I haven't dug very deeply into the code, so off-hand I don't know how hard that would be. A quick search of the code for TOPOLOGY_MAX_PENDING should point you to some of the touch points. Also keep in mind the dual meaning of TOPOLOGY_MAX_PENDING. In a standard storm topology it represents the maximum number of outstanding tuples. In a trident topology it represents the maximum number of outstanding batches. - Discmt: Hey guys. I've been taking time to look into it, and I feel like I might have an understanding of what exactly it is I need to do. If what @revans2 said is true, and all pending messages are kept within that RotatingMap then this should be somewhat straightforward. I am trying to compile my own storm.jar file right now but I haven't figured how. I tried using build_release.sh in the bin file, but I had no luck. I also tried using lein jar - xumingming: try the following: lein sub install lein install after these commands are executed, there should be a jar file named storm-xxx.jar in $STORM_HOME/target/. - Discmt: @xumingming . Thanks for the advice. I found that I had Leiningen 1, but the minimum for is Leiningen 2. - xumingming: yeah, storm requires lein 2 to build: https://github.com/nathanmarz/storm/blob/master/project.clj#L14 - Discmt: Hi guys. I got my development environment squared away and I can properly build releases now. I use the build_release.sh script. I tried making a change the way @ptgoetz and @revans2 had suggested by adding a method to the output collector to return the pending count. I have some questions about it. I noticed most of the collector implementations rely on a delegate, or mediator, which I'm assuming is defined here: https://github.com/nathanmarz/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/executor.clj#L504-515. So if I make a add a method to get the size of pending, defined here https://github.com/nathanmarz/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/executor.clj#L408-414, like so: (SpoutOutputCollector. (reify ISpoutOutputCollector (^int getPendingCount[this] (.size pending) ) (^List emit [this ^String stream-id ^List tuple ^Object
[jira] [Commented] (STORM-615) Add REST API to upload topology
[ https://issues.apache.org/jira/browse/STORM-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626548#comment-14626548 ] Sriharsha Chintalapani commented on STORM-615: -- [~revans2] do you have any suggestions on above approach. Add REST API to upload topology --- Key: STORM-615 URL: https://issues.apache.org/jira/browse/STORM-615 Project: Apache Storm Issue Type: Bug Reporter: Sriharsha Chintalapani Assignee: Arun Mahadevan Fix For: 0.10.0 Add REST api /api/v1/submitTopology to upload topology jars and config using REST api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-935] Update Disruptor queue version to ...
Github user errordaiwa commented on the pull request: https://github.com/apache/storm/pull/630#issuecomment-121140113 I do a performance test using storm 0.9.3 with disruptor queue 2.10.4. The target topology is metioned in [STORM-503](https://github.com/apache/storm/pull/625). To make result more clear, I raise bolt num to 1000. Here is the CPU usage. + base (Storm running with nothing) + user: 2% + sys: 1.5% + no timeout + user: 4% + sys: 1.5% + 1000ms timeout + user: 5% + sys: 2% + 100ms timeout + user: 6% + sys: 4.5% + 10ms timeout + user: 17% + sys: 26% --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-615) Add REST API to upload topology
[ https://issues.apache.org/jira/browse/STORM-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626595#comment-14626595 ] Robert Joseph Evans commented on STORM-615: --- Using policy files would work to prevent the code from doing bad things in the OS as a privileged user. But I don't think it solves the issue if authentication with nimbus still. No matter how we run the user code it still needs to authenticate with nimbus. We need to give the that code credentials to do so. We cannot use the UI user's credentials to do it because the end user could steal them, unless we do something where we hand the code a nimbus connection that is already authenticated and locked down in such a way that nimbus will enforce it being the user that we want. But that code does not currently exist, either on the client side or not the nimbus side. If we are going to make big changes like that I would much rather have us look at flux, and see if we can submit a topology with a jar, and a config file. Possibly having both of them in a single jar file. Instead of having the bolts and spouts deserialized in the worker, we could call a constructor and instantiate them directly in the worker, like what flux does. There is already a thrift definition for some of this, but I am not sure how advanced/tested it is, or what changes we would need to make to flux to support it. With this we no longer need to run any user code outside of the worker at all, or load an untrusted jar file. We just read the config file and submit the topology using the proxy settings. Add REST API to upload topology --- Key: STORM-615 URL: https://issues.apache.org/jira/browse/STORM-615 Project: Apache Storm Issue Type: Bug Reporter: Sriharsha Chintalapani Assignee: Arun Mahadevan Fix For: 0.10.0 Add REST api /api/v1/submitTopology to upload topology jars and config using REST api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level
[ https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626643#comment-14626643 ] ASF GitHub Bot commented on STORM-937: -- GitHub user rfarivar opened a pull request: https://github.com/apache/storm/pull/631 [STORM-937] Changing the log level from info to debug This is to reduce the noisiness of supervisor logs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rfarivar/storm STORM-937 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #631 commit 2d030413ab651140a3dd3655673b7bb81c1ce202 Author: rfarivar rfari...@yahoo-inc.com Date: 2015-07-14T16:51:59Z Changing the log level from info to debug StormBoundedExponentialBackoffRetry too noisy, lower log level -- Key: STORM-937 URL: https://issues.apache.org/jira/browse/STORM-937 Project: Apache Storm Issue Type: Improvement Reporter: Reza Farivar Assignee: Reza Farivar Priority: Minor The supervisor logs are currently overpopulated with log messages similar to this: 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5] The log level in the StormBoundedExponentialBackoffRetry is currently at info level. It seems it can be safely lowered to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level
Reza Farivar created STORM-937: -- Summary: StormBoundedExponentialBackoffRetry too noisy, lower log level Key: STORM-937 URL: https://issues.apache.org/jira/browse/STORM-937 Project: Apache Storm Issue Type: Improvement Reporter: Reza Farivar Assignee: Reza Farivar Priority: Minor The supervisor logs are currently overpopulated with log messages similar to this: 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5] The log level in the StormBoundedExponentialBackoffRetry is currently at info level. It seems it can be safely lowered to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level
[ https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626644#comment-14626644 ] Reza Farivar commented on STORM-937: Pull Request https://github.com/apache/storm/pull/631 StormBoundedExponentialBackoffRetry too noisy, lower log level -- Key: STORM-937 URL: https://issues.apache.org/jira/browse/STORM-937 Project: Apache Storm Issue Type: Improvement Reporter: Reza Farivar Assignee: Reza Farivar Priority: Minor The supervisor logs are currently overpopulated with log messages similar to this: 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5] The log level in the StormBoundedExponentialBackoffRetry is currently at info level. It seems it can be safely lowered to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level
[ https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reza Farivar updated STORM-937: --- Comment: was deleted (was: Pull Request https://github.com/apache/storm/pull/631) StormBoundedExponentialBackoffRetry too noisy, lower log level -- Key: STORM-937 URL: https://issues.apache.org/jira/browse/STORM-937 Project: Apache Storm Issue Type: Improvement Reporter: Reza Farivar Assignee: Reza Farivar Priority: Minor The supervisor logs are currently overpopulated with log messages similar to this: 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5] The log level in the StormBoundedExponentialBackoffRetry is currently at info level. It seems it can be safely lowered to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: A question about setup-default-uncaught-exception-handler function
Hi Chuanlei, The setup-default-uncaught-exception-handler fails fast in both OOM and other uncaught Exceptions. The difference in treatment is Runtime.halt vs Runtime.exist (Other uncaught exceptions handled here). If the OOM Error, we shutdown JVM using Runtime.halt. In other cases, we call Runtime.exit, which invokes all registered shutdownhooks, giving other parts a chance to gracefully finalize. Calling Runtime.halt, is extreme caution - as it shutdowns the system without calling any shutdownHooks. This extreme steps is essential for OOM as attempt to handle that itselt can rethrow more of OOMs. So to answer your question in short, we are failing fast - running other shutdownhooks or not is the only difference. -Kishor On Tuesday, July 14, 2015 10:40 AM, Chuanlei Ni nichuan...@gmail.com wrote: Hi, I want to know why setup-default-uncaught-exception-handler just deal with the OOM error, since the fast fail is the philosophy of Storm design. When a thread crashes in one Storm process, the process will lost its functionality mostly. Why not exit the whole process when an uncaught exception happens? If we deal exception in that way, we can remove a lot of labor for ops of storm. Thanks in advance!
[GitHub] storm pull request: [STORM-937] Changing the log level from info t...
GitHub user rfarivar opened a pull request: https://github.com/apache/storm/pull/631 [STORM-937] Changing the log level from info to debug This is to reduce the noisiness of supervisor logs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rfarivar/storm STORM-937 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #631 commit 2d030413ab651140a3dd3655673b7bb81c1ce202 Author: rfarivar rfari...@yahoo-inc.com Date: 2015-07-14T16:51:59Z Changing the log level from info to debug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: Storm 763/839 0.10.x
Github user knusbaum commented on a diff in the pull request: https://github.com/apache/storm/pull/617#discussion_r34599960 --- Diff: storm-core/src/jvm/backtype/storm/messaging/netty/Client.java --- @@ -59,20 +59,16 @@ * - Connecting and reconnecting are performed asynchronously. * - Note: The current implementation drops any messages that are being enqueued for sending if the connection to * the remote destination is currently unavailable. - * - A background flusher thread is run in the background. It will, at fixed intervals, check for any pending messages - * (i.e. messages buffered in memory) and flush them to the remote destination iff background flushing is currently - * enabled. */ public class Client extends ConnectionWithStatus implements IStatefulObject { +private static final long PENDING_MESSAGES_FLUSH_TIMEOUT_MS = 60L; --- End diff -- This seems like an incredibly large timeout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: Storm 763/839 0.10.x
Github user eshioji commented on a diff in the pull request: https://github.com/apache/storm/pull/617#discussion_r34602607 --- Diff: storm-core/src/jvm/backtype/storm/messaging/netty/Client.java --- @@ -59,20 +59,16 @@ * - Connecting and reconnecting are performed asynchronously. * - Note: The current implementation drops any messages that are being enqueued for sending if the connection to * the remote destination is currently unavailable. - * - A background flusher thread is run in the background. It will, at fixed intervals, check for any pending messages - * (i.e. messages buffered in memory) and flush them to the remote destination iff background flushing is currently - * enabled. */ public class Client extends ConnectionWithStatus implements IStatefulObject { +private static final long PENDING_MESSAGES_FLUSH_TIMEOUT_MS = 60L; --- End diff -- Maybe, this value was inherited from the current code tho. Maybe @miguno can shed light on the rationale? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (STORM-763) nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A
[ https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated STORM-763: -- Assignee: Enno Shioji nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A - Key: STORM-763 URL: https://issues.apache.org/jira/browse/STORM-763 Project: Apache Storm Issue Type: Bug Affects Versions: 0.9.4 Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux java version 1.7.0_03 storm 0.9.4 cluster 50+ machines Reporter: 3in Assignee: Enno Shioji Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux java version 1.7.0_03 storm 0.9.4 cluster 50+ machines my topology have 50+ worker, it can't emit 5 thousand tuples in ten minutes. sometimes one worker is reassigned to another machine by nimbus because of task heartbeat timeout: 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[440 440] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[90 90] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[510 510] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[160 160] not alive i can see the reassigned worker is already started in storm UI, but other worker write error log all the time: 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable The worker of destined host is already started, and i can telnet 192.168.163.19 5700. however, why the netty client can't connect to the ip:port? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-763) nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A
[ https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626944#comment-14626944 ] Enno Shioji commented on STORM-763: --- [~revans2] Yay, thank you! I'll ping [~ptgoetz] on the dev. mailing list. nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A - Key: STORM-763 URL: https://issues.apache.org/jira/browse/STORM-763 Project: Apache Storm Issue Type: Bug Affects Versions: 0.9.4 Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux java version 1.7.0_03 storm 0.9.4 cluster 50+ machines Reporter: 3in Assignee: Enno Shioji Fix For: 0.9.6 Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux java version 1.7.0_03 storm 0.9.4 cluster 50+ machines my topology have 50+ worker, it can't emit 5 thousand tuples in ten minutes. sometimes one worker is reassigned to another machine by nimbus because of task heartbeat timeout: 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[440 440] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[90 90] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[510 510] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[160 160] not alive i can see the reassigned worker is already started in storm UI, but other worker write error log all the time: 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable The worker of destined host is already started, and i can telnet 192.168.163.19 5700. however, why the netty client can't connect to the ip:port? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.9.6 release for STORM-763/839?
In addition to Enno's mail, I'd like to know we'd like to maintain three version lines (currently 0.9.x, 0.10.x, 0.11.x) continuously, or it is just a period of transition. I'm maintaining three version lines (bugfix, next minor, next major - with respecting semver) from other project, and sometimes maintaining it is really annoying, though most of the issues are breaking changes of next major and current. Since stable version of Storm is 0.9.5, and some users want to upgrade Storm with minimized impact so releasing 0.9.6 could make sense for now. (It may be better to collect some bugfixes and backport before releasing 0.9.6 since other version lines already has many bugfixes which can be applied to 0.9.x.) But I also think we should make an effort to make 0.10.0 stable and release official version. Storm 0.10.0-beta was released one month ago, and we don't have a plan to release next beta, or stable version. After releasing 0.10.0 we can choose whether we maintain 0.9.x lines or not. If it's possible to release stable version of 0.10.0 faster, for example, before end of this month, then we can delegate applying any bugfix issues (except critical / blocker) to 0.10.0. tl;dr. We can make an effort to 0.10.0 to make it stable, and release official version faster. If stable version of 0.10.0 could be released shortly, I would not want to release any new 0.9.x versions except any critical or blocker bugs. Thanks, Jungtaek Lim (HeartSaVioR) 2015-07-15 5:04 GMT+09:00 Enno Shioji eshi...@gmail.com: Hi Taylor, @Bobby recommended I talk to you about a potential 0.9.6 release to fix STORM-763/839. The fixes are already merged to master, 0.10.x and 0.9.x. In a nutshell, they fix the following symptoms -- but the catch is that these symptoms only surface when connections between bolts are lost frequently. So I'm not sure whether it warrants a release. The symptoms are: - Thread deadlock hazard in Netty Client (STORM-839) - Slow reconnection (STORM-763) - Verbose error log (STORM-763) Anyways, just thought I'll ping you. Btw thanks for your work, we are using Storm extensively and it's been amazing! Enno -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior
[GitHub] storm pull request: [STORM-937] Changing the log level from info t...
Github user nathanmarz commented on the pull request: https://github.com/apache/storm/pull/631#issuecomment-121403303 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level
[ https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627113#comment-14627113 ] ASF GitHub Bot commented on STORM-937: -- Github user nathanmarz commented on the pull request: https://github.com/apache/storm/pull/631#issuecomment-121403303 +1 StormBoundedExponentialBackoffRetry too noisy, lower log level -- Key: STORM-937 URL: https://issues.apache.org/jira/browse/STORM-937 Project: Apache Storm Issue Type: Improvement Reporter: Reza Farivar Assignee: Reza Farivar Priority: Minor The supervisor logs are currently overpopulated with log messages similar to this: 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5] The log level in the StormBoundedExponentialBackoffRetry is currently at info level. It seems it can be safely lowered to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level
[ https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627098#comment-14627098 ] ASF GitHub Bot commented on STORM-937: -- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/631#issuecomment-121400695 +1 StormBoundedExponentialBackoffRetry too noisy, lower log level -- Key: STORM-937 URL: https://issues.apache.org/jira/browse/STORM-937 Project: Apache Storm Issue Type: Improvement Reporter: Reza Farivar Assignee: Reza Farivar Priority: Minor The supervisor logs are currently overpopulated with log messages similar to this: 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5] The log level in the StormBoundedExponentialBackoffRetry is currently at info level. It seems it can be safely lowered to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-937] Changing the log level from info t...
Github user knusbaum commented on the pull request: https://github.com/apache/storm/pull/631#issuecomment-121365995 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
0.9.6 release for STORM-763/839?
Hi Taylor, @Bobby recommended I talk to you about a potential 0.9.6 release to fix STORM-763/839. The fixes are already merged to master, 0.10.x and 0.9.x. In a nutshell, they fix the following symptoms -- but the catch is that these symptoms only surface when connections between bolts are lost frequently. So I'm not sure whether it warrants a release. The symptoms are: - Thread deadlock hazard in Netty Client (STORM-839) - Slow reconnection (STORM-763) - Verbose error log (STORM-763) Anyways, just thought I'll ping you. Btw thanks for your work, we are using Storm extensively and it's been amazing! Enno
[jira] [Resolved] (STORM-839) Deadlock hazard in backtype.storm.messaging.netty.Client
[ https://issues.apache.org/jira/browse/STORM-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved STORM-839. --- Resolution: Fixed Assignee: Enno Shioji Fix Version/s: 0.9.6 Thanks [~eshioji], I merged this into master, branch-0.10.x and branch-0.9.x. You may want to talk to [~ptgoetz] about doing a 0.9.6 release. Deadlock hazard in backtype.storm.messaging.netty.Client Key: STORM-839 URL: https://issues.apache.org/jira/browse/STORM-839 Project: Apache Storm Issue Type: Bug Affects Versions: 0.9.4 Reporter: Enno Shioji Assignee: Enno Shioji Priority: Critical Fix For: 0.9.6 See the thread dump below that shows the deadlock. client-worker-1 is holding 7b5a7fa5 and waiting on 1446a1e9. Thread-10-disruptor-worker-transfer-queue is holding 1446a1e9 and is waiting on 7b5a7fa5. (Thread dump is truncated to show only the relevant parts) 2015-05-28 15:37:15 Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.72-b04 mixed mode): Thread-10-disruptor-worker-transfer-queue - Thread t@52 java.lang.Thread.State: BLOCKED at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:398) - waiting to lock 7b5a7fa5 (a java.lang.Object) owned by client-worker-1 t@25 at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128) at org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84) at org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779) at org.apache.storm.netty.channel.Channels.write(Channels.java:725) at org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) at org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) at org.apache.storm.netty.channel.Channels.write(Channels.java:704) at org.apache.storm.netty.channel.Channels.write(Channels.java:671) at org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248) at backtype.storm.messaging.netty.Client.flushMessages(Client.java:480) - locked 1446a1e9 (a backtype.storm.messaging.netty.Client) at backtype.storm.messaging.netty.Client.send(Client.java:412) - locked 1446a1e9 (a backtype.storm.messaging.netty.Client) at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5014$fn__5015.invoke(worker.clj:334) at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5014.invoke(worker.clj:332) at backtype.storm.disruptor$clojure_handler$reify__1446.onEvent(disruptor.clj:58) at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) at backtype.storm.disruptor$consume_loop_STAR_$fn__1459.invoke(disruptor.clj:94) at backtype.storm.util$async_loop$fn__458.invoke(util.clj:463) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Unknown Source) Locked ownable synchronizers: - None client-worker-1 - Thread t@25 java.lang.Thread.State: BLOCKED at backtype.storm.messaging.netty.Client.closeChannelAndReconnect(Client.java:501) - waiting to lock 1446a1e9 (a backtype.storm.messaging.netty.Client) owned by Thread-10-disruptor-worker-transfer-queue t@52 at backtype.storm.messaging.netty.Client.access$1400(Client.java:78) at backtype.storm.messaging.netty.Client$3.operationComplete(Client.java:492) at org.apache.storm.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427) at org.apache.storm.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413) at org.apache.storm.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380) at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:437) - locked 7b5a7fa5 (a java.lang.Object) at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373) at
[jira] [Updated] (STORM-615) Add REST API to upload topology
[ https://issues.apache.org/jira/browse/STORM-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated STORM-615: --- Fix Version/s: (was: 0.10.0) Add REST API to upload topology --- Key: STORM-615 URL: https://issues.apache.org/jira/browse/STORM-615 Project: Apache Storm Issue Type: Bug Reporter: Sriharsha Chintalapani Assignee: Arun Mahadevan Add REST api /api/v1/submitTopology to upload topology jars and config using REST api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range
[ https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627459#comment-14627459 ] ASF GitHub Bot commented on STORM-643: -- Github user vesense commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121473865 @HeartSaVioR @tedxia This PR is for branch 0.9.x. 0.9.x is very different from 0.10.x and master, so when we go the cherry-pick there are a lot of conflicts. In branch 0.10.0, the failed tuple has been managered by ExponentialBackoffMsgRetryManager, so we should test and verify whether we can reproduce the issue in 0.10.0. KafkaUtils repeatedly fetches messages whose offset is out of range --- Key: STORM-643 URL: https://issues.apache.org/jira/browse/STORM-643 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5 Reporter: Xin Wang Assignee: Xin Wang Priority: Minor KafkaUtils repeat fetch messages which offset is out of range. This happened when failed list(SortedSetLong failed) is not empty and some offset in it is OutOfRange. {code} [worker-log] 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 ... {code} [FIX] {code} storm.kafka.PartitionManager.fill(): ... try { msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, offset); } catch (UpdateOffsetException e) { _emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, _partition.partition, _spoutConfig); LOG.warn(Using new offset: {}, _emittedToOffset); // fetch failed, so don't update the metrics //fix bug: remove this offset from failed list when it is OutOfRange if (had_failed) { failed.remove(offset); } return; } ... {code} also: Log retrying with default start offset time from configuration. configured start offset time: [-2] is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...
Github user vesense commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121473865 @HeartSaVioR @tedxia This PR is for branch 0.9.x. 0.9.x is very different from 0.10.x and master, so when we go the cherry-pick there are a lot of conflicts. In branch 0.10.0, the failed tuple has been managered by ExponentialBackoffMsgRetryManager, so we should test and verify whether we can reproduce the issue in 0.10.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: A question about setup-default-uncaught-exception-handler function
Ok, understand. Thank you very much. And sorry for the halfheartedness while reading the code! Thank you again! 2015-07-15 1:17 GMT+08:00 Kishorkumar Patil kpa...@yahoo-inc.com.invalid: Hi Chuanlei, The setup-default-uncaught-exception-handler fails fast in both OOM and other uncaught Exceptions. The difference in treatment is Runtime.halt vs Runtime.exist (Other uncaught exceptions handled here). If the OOM Error, we shutdown JVM using Runtime.halt. In other cases, we call Runtime.exit, which invokes all registered shutdownhooks, giving other parts a chance to gracefully finalize. Calling Runtime.halt, is extreme caution - as it shutdowns the system without calling any shutdownHooks. This extreme steps is essential for OOM as attempt to handle that itselt can rethrow more of OOMs. So to answer your question in short, we are failing fast - running other shutdownhooks or not is the only difference. -Kishor On Tuesday, July 14, 2015 10:40 AM, Chuanlei Ni nichuan...@gmail.com wrote: Hi, I want to know why setup-default-uncaught-exception-handler just deal with the OOM error, since the fast fail is the philosophy of Storm design. When a thread crashes in one Storm process, the process will lost its functionality mostly. Why not exit the whole process when an uncaught exception happens? If we deal exception in that way, we can remove a lot of labor for ops of storm. Thanks in advance!
Re: 0.9.6 release for STORM-763/839?
My opinion: In terms of supporting the 0.9.x release line, I think that’s critical, at the very least until 0.10.0 stable is released. More likely for some time thereafter for those who don’t want to (or can’t) upgrade. Updates to the the 0.9.x line should be limited to bug fixes. I’d like to release a 0.10.0-beta2 or 0.10.0 soon. I haven’t seen much in terms of end user feedback on the beta. A few user/dev success stories would make me feel better about dropping the beta tag. As always, I’m open to any and all opinions. -Taylor On Jul 14, 2015, at 5:43 PM, 임정택 kabh...@gmail.com wrote: In addition to Enno's mail, I'd like to know we'd like to maintain three version lines (currently 0.9.x, 0.10.x, 0.11.x) continuously, or it is just a period of transition. I'm maintaining three version lines (bugfix, next minor, next major - with respecting semver) from other project, and sometimes maintaining it is really annoying, though most of the issues are breaking changes of next major and current. Since stable version of Storm is 0.9.5, and some users want to upgrade Storm with minimized impact so releasing 0.9.6 could make sense for now. (It may be better to collect some bugfixes and backport before releasing 0.9.6 since other version lines already has many bugfixes which can be applied to 0.9.x.) But I also think we should make an effort to make 0.10.0 stable and release official version. Storm 0.10.0-beta was released one month ago, and we don't have a plan to release next beta, or stable version. After releasing 0.10.0 we can choose whether we maintain 0.9.x lines or not. If it's possible to release stable version of 0.10.0 faster, for example, before end of this month, then we can delegate applying any bugfix issues (except critical / blocker) to 0.10.0. tl;dr. We can make an effort to 0.10.0 to make it stable, and release official version faster. If stable version of 0.10.0 could be released shortly, I would not want to release any new 0.9.x versions except any critical or blocker bugs. Thanks, Jungtaek Lim (HeartSaVioR) 2015-07-15 5:04 GMT+09:00 Enno Shioji eshi...@gmail.com: Hi Taylor, @Bobby recommended I talk to you about a potential 0.9.6 release to fix STORM-763/839. The fixes are already merged to master, 0.10.x and 0.9.x. In a nutshell, they fix the following symptoms -- but the catch is that these symptoms only surface when connections between bolts are lost frequently. So I'm not sure whether it warrants a release. The symptoms are: - Thread deadlock hazard in Netty Client (STORM-839) - Slow reconnection (STORM-763) - Verbose error log (STORM-763) Anyways, just thought I'll ping you. Btw thanks for your work, we are using Storm extensively and it's been amazing! Enno -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior signature.asc Description: Message signed with OpenPGP using GPGMail
[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...
Github user vesense commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121473131 @miguno Sorry I took so long to respond. Like @tpiscitell already explained, this problem are not solved by STORM-586 and STORM-511. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range
[ https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627456#comment-14627456 ] ASF GitHub Bot commented on STORM-643: -- Github user vesense commented on the pull request: https://github.com/apache/storm/pull/405#issuecomment-121473131 @miguno Sorry I took so long to respond. Like @tpiscitell already explained, this problem are not solved by STORM-586 and STORM-511. KafkaUtils repeatedly fetches messages whose offset is out of range --- Key: STORM-643 URL: https://issues.apache.org/jira/browse/STORM-643 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5 Reporter: Xin Wang Assignee: Xin Wang Priority: Minor KafkaUtils repeat fetch messages which offset is out of range. This happened when failed list(SortedSetLong failed) is not empty and some offset in it is OutOfRange. {code} [worker-log] 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with offset out of range: [20919071816]; retrying with default start offset time from configuration. configured start offset time: [-2] 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 20996130717 ... {code} [FIX] {code} storm.kafka.PartitionManager.fill(): ... try { msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, offset); } catch (UpdateOffsetException e) { _emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, _partition.partition, _spoutConfig); LOG.warn(Using new offset: {}, _emittedToOffset); // fetch failed, so don't update the metrics //fix bug: remove this offset from failed list when it is OutOfRange if (had_failed) { failed.remove(offset); } return; } ... {code} also: Log retrying with default start offset time from configuration. configured start offset time: [-2] is incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (STORM-918) Storm CLI could validate arguments/print usage
[ https://issues.apache.org/jira/browse/STORM-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Rajendran reassigned STORM-918: - Assignee: Shyam Rajendran Storm CLI could validate arguments/print usage -- Key: STORM-918 URL: https://issues.apache.org/jira/browse/STORM-918 Project: Apache Storm Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Derek Dagit Assignee: Shyam Rajendran Priority: Minor Labels: Newbie It would be nice if the storm CLI printed usage information if arguments are missing. For example, when omitting the argument to the kill sub-command, a JVM is launched and an exception complaining that a topology named 'nil' is not alive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (STORM-918) Storm CLI could validate arguments/print usage
[ https://issues.apache.org/jira/browse/STORM-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Rajendran updated STORM-918: -- Assignee: (was: Shyam Rajendran) Storm CLI could validate arguments/print usage -- Key: STORM-918 URL: https://issues.apache.org/jira/browse/STORM-918 Project: Apache Storm Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Derek Dagit Priority: Minor Labels: Newbie It would be nice if the storm CLI printed usage information if arguments are missing. For example, when omitting the argument to the kill sub-command, a JVM is launched and an exception complaining that a topology named 'nil' is not alive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (STORM-869) kafka spout cannot fetch message if log size is above fetchSizeBytes
[ https://issues.apache.org/jira/browse/STORM-869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani closed STORM-869. Resolution: Invalid kafka spout cannot fetch message if log size is above fetchSizeBytes Key: STORM-869 URL: https://issues.apache.org/jira/browse/STORM-869 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.5 Reporter: Adrian Seungjin Lee Priority: Critical let's say maxFetchSizeBytes is set to 1 megabytes, then if there exists a message that is bigger than 1 m, kafka spout just hangs and become inactive. This is both happening in Kafka spout for bolt/spout topology and also in trident spouts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-869) kafka spout cannot fetch message if log size is above fetchSizeBytes
[ https://issues.apache.org/jira/browse/STORM-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627501#comment-14627501 ] Sriharsha Chintalapani commented on STORM-869: -- [~sweetest_sj] this is not a issue with kafka spout . Its how the kafka api is your fetch.max.message.bytes should always be higher than the max message size . you can check broker's server.properties message.max.bytes and make sure you set fetch.max.message.bytes to equal or higher. kafka spout cannot fetch message if log size is above fetchSizeBytes Key: STORM-869 URL: https://issues.apache.org/jira/browse/STORM-869 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.5 Reporter: Adrian Seungjin Lee Priority: Critical let's say maxFetchSizeBytes is set to 1 megabytes, then if there exists a message that is bigger than 1 m, kafka spout just hangs and become inactive. This is both happening in Kafka spout for bolt/spout topology and also in trident spouts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-918 Storm CLI could validate arguments/p...
GitHub user bourneagain opened a pull request: https://github.com/apache/storm/pull/632 STORM-918 Storm CLI could validate arguments/print usage Storm commands that mandate proper args to be passed would now throw the function's doc string as help rather than exiting out with error after creating the JVM process. Example : ./storm kill Syntax: [storm kill topology-name [-w wait-time-secs]] Kills the topology with the name topology-name. Storm will first deactivate the topology's spouts for the duration of the topology's message timeout to allow all messages currently being processed to finish processing. Storm will then shutdown the workers and clean up their state. You can override the length of time Storm waits between deactivation and shutdown with the -w flag. ./storm get-errors Syntax: [storm get-errors topology-name] Get the latest error from the running topology. The returned result contains the key value pairs for component-name and component-error for the components in error. The result is returned in json format. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bourneagain/storm STORM-918 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/632.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #632 commit fc09bf07afca95a82cbdbb4f10d47c56bdff3918 Author: Shyam Rajendran rshyam@gmail.com Date: 2015-07-15T05:42:41Z STORM-918 Storm CLI could validate arguments/print usage Storm commands that mandate proper args to be passed would not thrown the function doc string rather than errorring out after creating the JVM process. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-742) Very busy ShellBolt subprocess with ACK mode cannot respond heartbeat just in time
[ https://issues.apache.org/jira/browse/STORM-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627496#comment-14627496 ] ASF GitHub Bot commented on STORM-742: -- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/497#issuecomment-121484496 @dashengju Could you share your patch (limiting queue size) if you don't mind? Very busy ShellBolt subprocess with ACK mode cannot respond heartbeat just in time -- Key: STORM-742 URL: https://issues.apache.org/jira/browse/STORM-742 Project: Apache Storm Issue Type: Bug Affects Versions: 0.9.3, 0.10.0, 0.9.4, 0.11.0 Reporter: Jungtaek Lim Assignee: Jungtaek Lim Priority: Critical As [~dashengju] stated from STORM-738, very busy ShellBolt subprocess cannot respond heartbeat just in time. Actually it's by design constraint (more details are on STORM-513 or STORM-738), but ShellSpout avoids constraint by updating heartbeat at any type of response from subprocess. We can apply this approach to ShellBolt and let ShellBolt avoid design constraint, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (STORM-837) HdfsState ignores commits
[ https://issues.apache.org/jira/browse/STORM-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Mahadevan reassigned STORM-837: Assignee: Arun Mahadevan HdfsState ignores commits - Key: STORM-837 URL: https://issues.apache.org/jira/browse/STORM-837 Project: Apache Storm Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Arun Mahadevan Priority: Critical HdfsState works with trident which is supposed to provide exactly once processing. It does this two ways, first by informing the state about commits so it can be sure the data is written out, and second by having a commit id, so that double commits can be handled. HdfsState ignores the beginCommit and commit calls, and with that ignores the ids. This means that if you use HdfsState and your worker crashes you may both lose data and get some data twice. At a minimum the flush and file rotation should be tied to the commit in some way. The commit ID should at a minimum be written out with the data so someone reading the data can have a hope of deduping it themselves. Also with the rotationActions it is possible for a file that was partially written is leaked, and never moved to the final location, because it is not rotated. I personally think the actions are too generic for this case and need to be deprecated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (STORM-918) Storm CLI could validate arguments/print usage
[ https://issues.apache.org/jira/browse/STORM-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Rajendran reassigned STORM-918: - Assignee: Shyam Rajendran Storm CLI could validate arguments/print usage -- Key: STORM-918 URL: https://issues.apache.org/jira/browse/STORM-918 Project: Apache Storm Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Derek Dagit Assignee: Shyam Rajendran Priority: Minor Labels: Newbie It would be nice if the storm CLI printed usage information if arguments are missing. For example, when omitting the argument to the kill sub-command, a JVM is launched and an exception complaining that a topology named 'nil' is not alive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
0.9.6 release for STORM-763/839?
To be more clear, I'm fine with maintaining three minor version lines while it is period of transition. Just a thing I'm curious is we'd like to maintain two stable version lines continuously (with one unstable). For now, I'm fine with releasing 0.9.6 with backporting bugfixes which are already committed to master but not backported. (Same things applied to 0.10.0) Btw, 0.10.0 has its own issue, STORM-903 https://issues.apache.org/jira/browse/STORM-903, which is discussed but not resolved. We may want to resolve it before next release. Best, Jungtaek Lim (HeartSaVioR) 2015-07-15 11:02 GMT+09:00 P. Taylor Goetz ptgo...@gmail.com javascript:_e(%7B%7D,'cvml','ptgo...@gmail.com');: My opinion: In terms of supporting the 0.9.x release line, I think that’s critical, at the very least until 0.10.0 stable is released. More likely for some time thereafter for those who don’t want to (or can’t) upgrade. Updates to the the 0.9.x line should be limited to bug fixes. I’d like to release a 0.10.0-beta2 or 0.10.0 soon. I haven’t seen much in terms of end user feedback on the beta. A few user/dev success stories would make me feel better about dropping the beta tag. As always, I’m open to any and all opinions. -Taylor On Jul 14, 2015, at 5:43 PM, 임정택 kabh...@gmail.com javascript:_e(%7B%7D,'cvml','kabh...@gmail.com'); wrote: In addition to Enno's mail, I'd like to know we'd like to maintain three version lines (currently 0.9.x, 0.10.x, 0.11.x) continuously, or it is just a period of transition. I'm maintaining three version lines (bugfix, next minor, next major - with respecting semver) from other project, and sometimes maintaining it is really annoying, though most of the issues are breaking changes of next major and current. Since stable version of Storm is 0.9.5, and some users want to upgrade Storm with minimized impact so releasing 0.9.6 could make sense for now. (It may be better to collect some bugfixes and backport before releasing 0.9.6 since other version lines already has many bugfixes which can be applied to 0.9.x.) But I also think we should make an effort to make 0.10.0 stable and release official version. Storm 0.10.0-beta was released one month ago, and we don't have a plan to release next beta, or stable version. After releasing 0.10.0 we can choose whether we maintain 0.9.x lines or not. If it's possible to release stable version of 0.10.0 faster, for example, before end of this month, then we can delegate applying any bugfix issues (except critical / blocker) to 0.10.0. tl;dr. We can make an effort to 0.10.0 to make it stable, and release official version faster. If stable version of 0.10.0 could be released shortly, I would not want to release any new 0.9.x versions except any critical or blocker bugs. Thanks, Jungtaek Lim (HeartSaVioR) 2015-07-15 5:04 GMT+09:00 Enno Shioji eshi...@gmail.com javascript:_e(%7B%7D,'cvml','eshi...@gmail.com');: Hi Taylor, @Bobby recommended I talk to you about a potential 0.9.6 release to fix STORM-763/839. The fixes are already merged to master, 0.10.x and 0.9.x. In a nutshell, they fix the following symptoms -- but the catch is that these symptoms only surface when connections between bolts are lost frequently. So I'm not sure whether it warrants a release. The symptoms are: - Thread deadlock hazard in Netty Client (STORM-839) - Slow reconnection (STORM-763) - Verbose error log (STORM-763) Anyways, just thought I'll ping you. Btw thanks for your work, we are using Storm extensively and it's been amazing! Enno -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior
[GitHub] storm pull request: STORM-742 Let ShellBolt treat all messages to ...
Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/497#issuecomment-121484496 @dashengju Could you share your patch (limiting queue size) if you don't mind? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: Storm 763/839 0.10.x
Github user revans2 commented on a diff in the pull request: https://github.com/apache/storm/pull/617#discussion_r34603263 --- Diff: storm-core/src/jvm/backtype/storm/messaging/netty/Client.java --- @@ -59,20 +59,16 @@ * - Connecting and reconnecting are performed asynchronously. * - Note: The current implementation drops any messages that are being enqueued for sending if the connection to * the remote destination is currently unavailable. - * - A background flusher thread is run in the background. It will, at fixed intervals, check for any pending messages - * (i.e. messages buffered in memory) and flush them to the remote destination iff background flushing is currently - * enabled. */ public class Client extends ConnectionWithStatus implements IStatefulObject { +private static final long PENDING_MESSAGES_FLUSH_TIMEOUT_MS = 60L; --- End diff -- It is large, but it matches the code that was there before. This is only used on a close, as a timeout when trying to be sure that we have sent all pending messages. We should look at potentially lowering it, but I think that is for a different JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: Storm 763/839 0.11.x
Github user revans2 commented on the pull request: https://github.com/apache/storm/pull/616#issuecomment-121338740 +1 the code compiles and the tests all pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: [STORM-763] nimbus reassigned worker A to anot...
Github user revans2 commented on the pull request: https://github.com/apache/storm/pull/568#issuecomment-121338790 +1 the code compiles and the tests all pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: Storm 763/839 0.10.x
Github user revans2 commented on the pull request: https://github.com/apache/storm/pull/617#issuecomment-121338762 +1 the code compiles and the tests all pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-763) nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A
[ https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626850#comment-14626850 ] ASF GitHub Bot commented on STORM-763: -- Github user revans2 commented on the pull request: https://github.com/apache/storm/pull/568#issuecomment-121338790 +1 the code compiles and the tests all pass nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A - Key: STORM-763 URL: https://issues.apache.org/jira/browse/STORM-763 Project: Apache Storm Issue Type: Bug Affects Versions: 0.9.4 Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux java version 1.7.0_03 storm 0.9.4 cluster 50+ machines Reporter: 3in Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux java version 1.7.0_03 storm 0.9.4 cluster 50+ machines my topology have 50+ worker, it can't emit 5 thousand tuples in ten minutes. sometimes one worker is reassigned to another machine by nimbus because of task heartbeat timeout: 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[440 440] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[90 90] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[510 510] not alive 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor my_topology-22-1428243953:[160 160] not alive i can see the reassigned worker is already started in storm UI, but other worker write error log all the time: 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) destined for Netty-Client-host_19/192.168.163.19:5700 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to Netty-Client-host_19/192.168.163.19:5700 is unavailable The worker of destined host is already started, and i can telnet 192.168.163.19 5700. however, why the netty client can't connect to the ip:port? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-763] nimbus reassigned worker A to anot...
Github user knusbaum commented on the pull request: https://github.com/apache/storm/pull/568#issuecomment-121344016 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: Storm 763/839 0.11.x
Github user knusbaum commented on the pull request: https://github.com/apache/storm/pull/616#issuecomment-121343845 +1 tests pass, examples run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: Storm 763/839 0.10.x
Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/617 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: Storm 763/839 0.11.x
Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/616 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: [STORM-839] Deadlock hazard in backtype.storm....
Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/566 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level
[ https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626756#comment-14626756 ] ASF GitHub Bot commented on STORM-937: -- Github user d2r commented on a diff in the pull request: https://github.com/apache/storm/pull/631#discussion_r34597439 --- Diff: storm-core/src/jvm/backtype/storm/utils/StormBoundedExponentialBackoffRetry.java --- @@ -44,7 +44,7 @@ public StormBoundedExponentialBackoffRetry(int baseSleepTimeMs, int maxSleepTime expRetriesThreshold = 1; while ((1 (expRetriesThreshold + 1)) ((maxSleepTimeMs - baseSleepTimeMs) / 2)) expRetriesThreshold++; -LOG.info(The baseSleepTimeMs [ + baseSleepTimeMs + ] the maxSleepTimeMs [ + maxSleepTimeMs + ] + +LOG.debug(The baseSleepTimeMs [ + baseSleepTimeMs + ] the maxSleepTimeMs [ + maxSleepTimeMs + ] + --- End diff -- It would be nice to use the parameterized call, so that Strings are not constructed unnecessarily: ```Java Log.debug(The baseSleepTimeMs [{}] ... , baseSleepTimeMs, ...); ``` StormBoundedExponentialBackoffRetry too noisy, lower log level -- Key: STORM-937 URL: https://issues.apache.org/jira/browse/STORM-937 Project: Apache Storm Issue Type: Improvement Reporter: Reza Farivar Assignee: Reza Farivar Priority: Minor The supervisor logs are currently overpopulated with log messages similar to this: 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5] The log level in the StormBoundedExponentialBackoffRetry is currently at info level. It seems it can be safely lowered to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-937] Changing the log level from info t...
Github user d2r commented on a diff in the pull request: https://github.com/apache/storm/pull/631#discussion_r34597439 --- Diff: storm-core/src/jvm/backtype/storm/utils/StormBoundedExponentialBackoffRetry.java --- @@ -44,7 +44,7 @@ public StormBoundedExponentialBackoffRetry(int baseSleepTimeMs, int maxSleepTime expRetriesThreshold = 1; while ((1 (expRetriesThreshold + 1)) ((maxSleepTimeMs - baseSleepTimeMs) / 2)) expRetriesThreshold++; -LOG.info(The baseSleepTimeMs [ + baseSleepTimeMs + ] the maxSleepTimeMs [ + maxSleepTimeMs + ] + +LOG.debug(The baseSleepTimeMs [ + baseSleepTimeMs + ] the maxSleepTimeMs [ + maxSleepTimeMs + ] + --- End diff -- It would be nice to use the parameterized call, so that Strings are not constructed unnecessarily: ```Java Log.debug(The baseSleepTimeMs [{}] ... , baseSleepTimeMs, ...); ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---