[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625931#comment-14625931
 ] 

ASF GitHub Bot commented on STORM-643:
--

Github user alexsobrino commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121146615
  
+1


 KafkaUtils repeatedly fetches messages whose offset is out of range
 ---

 Key: STORM-643
 URL: https://issues.apache.org/jira/browse/STORM-643
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka
Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5
Reporter: Xin Wang
Assignee: Xin Wang
Priority: Minor

 KafkaUtils repeat fetch messages which offset is out of range.
 This happened when failed list(SortedSetLong failed) is not empty and some 
 offset in it is OutOfRange.
 {code}
 [worker-log]
 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 ...
 {code}
 [FIX]
 {code}
 storm.kafka.PartitionManager.fill():
 ...
 try {
   msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, 
 offset);
 } catch (UpdateOffsetException e) {
_emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, 
 _partition.partition, _spoutConfig);
   LOG.warn(Using new offset: {}, _emittedToOffset);
   // fetch failed, so don't update the metrics
   //fix bug: remove this offset from failed list when it is OutOfRange
   if (had_failed) {
   failed.remove(offset);
   }
 return;
 }
 ...
 {code}
 also: Log retrying with default start offset time from configuration. 
 configured start offset time: [-2] is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: [STORM-935] Update Disruptor queue version to ...

2015-07-14 Thread HeartSaVioR
Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/630#issuecomment-121150967
  
@errordaiwa @amontalenti 
1000ms timeout makes sense to me. 
Actually 100ms timeout also makes sense to me, but I'd like to know our 
opinions about load aware.

We're having a option to set timeout now, so it would be no issue.
I'll run some performance tests and check no tuples are failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-935) Update Disruptor queue version to 2.10.4

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625967#comment-14625967
 ] 

ASF GitHub Bot commented on STORM-935:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/630#issuecomment-121150967
  
@errordaiwa @amontalenti 
1000ms timeout makes sense to me. 
Actually 100ms timeout also makes sense to me, but I'd like to know our 
opinions about load aware.

We're having a option to set timeout now, so it would be no issue.
I'll run some performance tests and check no tuples are failed.


 Update Disruptor queue version to 2.10.4
 

 Key: STORM-935
 URL: https://issues.apache.org/jira/browse/STORM-935
 Project: Apache Storm
  Issue Type: Dependency upgrade
Affects Versions: 0.11.0
Reporter: Xingyu Su

 Storm now still use an old version of Disruptor queue(ver 2.10.1). This 
 version has some potential race problems. Version 2.10.4 has fixed these 
 bugs. https://issues.apache.org/jira/browse/STORM-503 will benifit from this 
 update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...

2015-07-14 Thread alexsobrino
Github user alexsobrino commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121146615
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...

2015-07-14 Thread tedxia
Github user tedxia commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121188995
  
In branch 0.10.0, the failed tuple has been managered by 
ExponentialBackoffMsgRetryManager, should we also make this change to 0.10.0 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...

2015-07-14 Thread mvalleavila
Github user mvalleavila commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121189676
  
We are reproducing the issue too, +1
Thx!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626140#comment-14626140
 ] 

ASF GitHub Bot commented on STORM-643:
--

Github user tedxia commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121188995
  
In branch 0.10.0, the failed tuple has been managered by 
ExponentialBackoffMsgRetryManager, should we also make this change to 0.10.0 


 KafkaUtils repeatedly fetches messages whose offset is out of range
 ---

 Key: STORM-643
 URL: https://issues.apache.org/jira/browse/STORM-643
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka
Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5
Reporter: Xin Wang
Assignee: Xin Wang
Priority: Minor

 KafkaUtils repeat fetch messages which offset is out of range.
 This happened when failed list(SortedSetLong failed) is not empty and some 
 offset in it is OutOfRange.
 {code}
 [worker-log]
 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 ...
 {code}
 [FIX]
 {code}
 storm.kafka.PartitionManager.fill():
 ...
 try {
   msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, 
 offset);
 } catch (UpdateOffsetException e) {
_emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, 
 _partition.partition, _spoutConfig);
   LOG.warn(Using new offset: {}, _emittedToOffset);
   // fetch failed, so don't update the metrics
   //fix bug: remove this offset from failed list when it is OutOfRange
   if (had_failed) {
   failed.remove(offset);
   }
 return;
 }
 ...
 {code}
 also: Log retrying with default start offset time from configuration. 
 configured start offset time: [-2] is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...

2015-07-14 Thread ellull
Github user ellull commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121180650
  
We are also facing this issue, so +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626112#comment-14626112
 ] 

ASF GitHub Bot commented on STORM-643:
--

Github user ellull commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121180650
  
We are also facing this issue, so +1


 KafkaUtils repeatedly fetches messages whose offset is out of range
 ---

 Key: STORM-643
 URL: https://issues.apache.org/jira/browse/STORM-643
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka
Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5
Reporter: Xin Wang
Assignee: Xin Wang
Priority: Minor

 KafkaUtils repeat fetch messages which offset is out of range.
 This happened when failed list(SortedSetLong failed) is not empty and some 
 offset in it is OutOfRange.
 {code}
 [worker-log]
 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 ...
 {code}
 [FIX]
 {code}
 storm.kafka.PartitionManager.fill():
 ...
 try {
   msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, 
 offset);
 } catch (UpdateOffsetException e) {
_emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, 
 _partition.partition, _spoutConfig);
   LOG.warn(Using new offset: {}, _emittedToOffset);
   // fetch failed, so don't update the metrics
   //fix bug: remove this offset from failed list when it is OutOfRange
   if (had_failed) {
   failed.remove(offset);
   }
 return;
 }
 ...
 {code}
 also: Log retrying with default start offset time from configuration. 
 configured start offset time: [-2] is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: STORM-67 Provide API for spouts to know how ma...

2015-07-14 Thread bourneagain
Github user bourneagain commented on the pull request:

https://github.com/apache/storm/pull/593#issuecomment-121281726
  
Thanks @HeartSaVioR .We can have this merged to master whenever we feel 
appropriate. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-67) Provide API for spouts to know how many pending messages there are

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626476#comment-14626476
 ] 

ASF GitHub Bot commented on STORM-67:
-

Github user bourneagain commented on the pull request:

https://github.com/apache/storm/pull/593#issuecomment-121281726
  
Thanks @HeartSaVioR .We can have this merged to master whenever we feel 
appropriate. 


 Provide API for spouts to know how many pending messages there are
 --

 Key: STORM-67
 URL: https://issues.apache.org/jira/browse/STORM-67
 Project: Apache Storm
  Issue Type: New Feature
Reporter: James Xu
Assignee: Shyam Rajendran
  Labels: newbie

 https://github.com/nathanmarz/storm/issues/343
 This would be useful in case you want to take special action in the spout 
 like drop messages
 -
 Discmt: Hi, I'd like to try and take a crack at this if it's still relevant. 
 I'm not exactly sure what it's asking for though. It seems to me an 
 implementation for knowing how many pending messages there are for a spout 
 depends on where the spout is getting it's information from, which makes me 
 sure I'm missing something.
 -
 revans2: The spout code in backtype/storm/daemon/executor.clj is already 
 keeping track of the pending tuples if acking is enabled. If acking is 
 disabled nothing is pending.
 defmethod mk-threads :spout [executor-data task-datas]
 defines pending as a RotatingMap which maps all of the storm internal tuple 
 ids to the message id objects passed in by the spout when it first emitted 
 the tuple. The hardest part should be getting pending to a place where the 
 ISpoutOutputCollector implementation or where ever the API is, can get access 
 to it.
 -
 ptgoetz: @Discmt Yes, this is still relevant and would be nice to have.
 The Storm framework asks spouts for tuples by calling the nextTuple() method 
 and keeps track of the tuple tree internally. The underlying data source does 
 not come into play.
 As implied by @revans2, one approach would be to add a method to 
 ISpoutOutputCollector such as getPendingCount() that would allow spout 
 implementations to query for the pending count (probably returning -1 if 
 acking is disabled). The tricky part will likely be bridging the gap between 
 executor.clj and the ISpoutOutputCollector implementation(s). I haven't dug 
 very deeply into the code, so off-hand I don't know how hard that would be. A 
 quick search of the code for TOPOLOGY_MAX_PENDING should point you to some of 
 the touch points.
 Also keep in mind the dual meaning of TOPOLOGY_MAX_PENDING. In a standard 
 storm topology it represents the maximum number of outstanding tuples. In a 
 trident topology it represents the maximum number of outstanding batches.
 -
 Discmt: Hey guys. I've been taking time to look into it, and I feel like I 
 might have an understanding of what exactly it is I need to do. If what 
 @revans2 said is true, and all pending messages are kept within that 
 RotatingMap then this should be somewhat straightforward. I am trying to 
 compile my own storm.jar file right now but I haven't figured how. I tried 
 using build_release.sh in the bin file, but I had no luck. I also tried using 
 lein jar
 -
 xumingming: try the following:
 lein sub install
 lein install
 after these commands are executed, there should be a jar file named 
 storm-xxx.jar in $STORM_HOME/target/.
 -
 Discmt: @xumingming . Thanks for the advice. I found that I had Leiningen 1, 
 but the minimum for is Leiningen 2.
 -
 xumingming: yeah, storm requires lein 2 to build: 
 https://github.com/nathanmarz/storm/blob/master/project.clj#L14
 -
 Discmt: Hi guys. I got my development environment squared away and I can 
 properly build releases now. I use the build_release.sh script. I tried 
 making a change the way @ptgoetz and @revans2 had suggested by adding a 
 method to the output collector to return the pending count. I have some 
 questions about it.
 I noticed most of the collector implementations rely on a delegate, or 
 mediator, which I'm assuming is defined here: 
 https://github.com/nathanmarz/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/executor.clj#L504-515.
  So if I make a add a method to get the size of pending, defined here 
 https://github.com/nathanmarz/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/executor.clj#L408-414,
  like so:
  (SpoutOutputCollector.
   (reify ISpoutOutputCollector
 (^int getPendingCount[this] 
   (.size pending)
   )
 (^List emit [this ^String stream-id ^List tuple ^Object 
 

[jira] [Commented] (STORM-615) Add REST API to upload topology

2015-07-14 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626548#comment-14626548
 ] 

Sriharsha Chintalapani commented on STORM-615:
--

[~revans2] do you have any suggestions on above approach.

 Add REST API to upload topology
 ---

 Key: STORM-615
 URL: https://issues.apache.org/jira/browse/STORM-615
 Project: Apache Storm
  Issue Type: Bug
Reporter: Sriharsha Chintalapani
Assignee: Arun Mahadevan
 Fix For: 0.10.0


 Add REST api /api/v1/submitTopology to upload topology jars and config using 
 REST api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: [STORM-935] Update Disruptor queue version to ...

2015-07-14 Thread errordaiwa
Github user errordaiwa commented on the pull request:

https://github.com/apache/storm/pull/630#issuecomment-121140113
  
I do a performance test using storm 0.9.3 with disruptor queue 2.10.4. The 
target topology is metioned in 
[STORM-503](https://github.com/apache/storm/pull/625). To make result more 
clear, I raise bolt num to 1000.

Here is the CPU usage.
+ base (Storm running with nothing)
  + user: 2%
  + sys: 1.5%
+ no timeout
  + user: 4%
  + sys: 1.5%
+ 1000ms timeout
  + user: 5%
  + sys: 2%
+ 100ms timeout
  + user: 6%
  + sys: 4.5%
+ 10ms timeout
  + user: 17%
  + sys: 26%



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-615) Add REST API to upload topology

2015-07-14 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626595#comment-14626595
 ] 

Robert Joseph Evans commented on STORM-615:
---

Using policy files would work to prevent the code from doing bad things in the 
OS as a privileged user.  But I don't think it solves the issue if 
authentication with nimbus still.  No matter how we run the user code it still 
needs to authenticate with nimbus.  We need to give the that code credentials 
to do so.  We cannot use the UI user's credentials to do it because the end 
user could steal them, unless we do something where we hand the code a nimbus 
connection that is already authenticated and locked down in such a way that 
nimbus will enforce it being the user that we want.  But that code does not 
currently exist, either on the client side or not the nimbus side.

If we are going to make big changes like that I would much rather have us look 
at flux, and see if we can submit a topology with a jar, and a config file.  
Possibly having both of them in a single jar file.  Instead of having the bolts 
and spouts deserialized in the worker, we could call a constructor and 
instantiate them directly in the worker, like what flux does.  There is already 
a thrift definition for some of this, but I am not sure how advanced/tested it 
is, or what changes we would need to make to flux to support it.  With this we 
no longer need to run any user code outside of the worker at all, or load an 
untrusted jar file.  We just read the config file and submit the topology using 
the proxy settings.

 Add REST API to upload topology
 ---

 Key: STORM-615
 URL: https://issues.apache.org/jira/browse/STORM-615
 Project: Apache Storm
  Issue Type: Bug
Reporter: Sriharsha Chintalapani
Assignee: Arun Mahadevan
 Fix For: 0.10.0


 Add REST api /api/v1/submitTopology to upload topology jars and config using 
 REST api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626643#comment-14626643
 ] 

ASF GitHub Bot commented on STORM-937:
--

GitHub user rfarivar opened a pull request:

https://github.com/apache/storm/pull/631

[STORM-937] Changing the log level from info to debug

This is to reduce the noisiness of supervisor logs. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rfarivar/storm STORM-937

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #631


commit 2d030413ab651140a3dd3655673b7bb81c1ce202
Author: rfarivar rfari...@yahoo-inc.com
Date:   2015-07-14T16:51:59Z

Changing the log level from info to debug




 StormBoundedExponentialBackoffRetry too noisy, lower log level
 --

 Key: STORM-937
 URL: https://issues.apache.org/jira/browse/STORM-937
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Reza Farivar
Assignee: Reza Farivar
Priority: Minor

 The supervisor logs are currently overpopulated with log messages similar to 
 this: 
 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The 
 baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5]
 The log level in the StormBoundedExponentialBackoffRetry is currently at info 
 level. It seems it can be safely lowered to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level

2015-07-14 Thread Reza Farivar (JIRA)
Reza Farivar created STORM-937:
--

 Summary: StormBoundedExponentialBackoffRetry too noisy, lower log 
level
 Key: STORM-937
 URL: https://issues.apache.org/jira/browse/STORM-937
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Reza Farivar
Assignee: Reza Farivar
Priority: Minor


The supervisor logs are currently overpopulated with log messages similar to 
this: 

2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The 
baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5]

The log level in the StormBoundedExponentialBackoffRetry is currently at info 
level. It seems it can be safely lowered to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level

2015-07-14 Thread Reza Farivar (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626644#comment-14626644
 ] 

Reza Farivar commented on STORM-937:


Pull Request https://github.com/apache/storm/pull/631

 StormBoundedExponentialBackoffRetry too noisy, lower log level
 --

 Key: STORM-937
 URL: https://issues.apache.org/jira/browse/STORM-937
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Reza Farivar
Assignee: Reza Farivar
Priority: Minor

 The supervisor logs are currently overpopulated with log messages similar to 
 this: 
 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The 
 baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5]
 The log level in the StormBoundedExponentialBackoffRetry is currently at info 
 level. It seems it can be safely lowered to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level

2015-07-14 Thread Reza Farivar (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reza Farivar updated STORM-937:
---
Comment: was deleted

(was: Pull Request https://github.com/apache/storm/pull/631)

 StormBoundedExponentialBackoffRetry too noisy, lower log level
 --

 Key: STORM-937
 URL: https://issues.apache.org/jira/browse/STORM-937
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Reza Farivar
Assignee: Reza Farivar
Priority: Minor

 The supervisor logs are currently overpopulated with log messages similar to 
 this: 
 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The 
 baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5]
 The log level in the StormBoundedExponentialBackoffRetry is currently at info 
 level. It seems it can be safely lowered to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: A question about setup-default-uncaught-exception-handler function

2015-07-14 Thread Kishorkumar Patil
Hi Chuanlei,
The setup-default-uncaught-exception-handler fails fast in both OOM and other 
uncaught Exceptions. The difference in treatment is Runtime.halt vs 
Runtime.exist (Other uncaught exceptions handled here). 

If the OOM Error, we shutdown JVM using Runtime.halt. In other cases, we call 
Runtime.exit,  which invokes all registered shutdownhooks, giving other parts a 
chance to gracefully finalize. 

Calling Runtime.halt, is extreme caution - as it shutdowns the system without 
calling any shutdownHooks. This extreme steps is essential for OOM as attempt 
to handle that itselt can rethrow more of OOMs.
So to answer your question in short, we are failing fast - running other 
shutdownhooks or not is the only difference.

-Kishor




 


 On Tuesday, July 14, 2015 10:40 AM, Chuanlei Ni nichuan...@gmail.com 
wrote:
   

 Hi,
  I want to know why setup-default-uncaught-exception-handler just deal
with the OOM error, since the fast fail is the philosophy of Storm design.
When a thread crashes in one Storm process, the process will lost its
functionality mostly.
Why not exit the whole process when an uncaught exception happens? If we
deal exception in that way, we can remove a lot of labor for ops of storm.

Thanks in advance!


  

[GitHub] storm pull request: [STORM-937] Changing the log level from info t...

2015-07-14 Thread rfarivar
GitHub user rfarivar opened a pull request:

https://github.com/apache/storm/pull/631

[STORM-937] Changing the log level from info to debug

This is to reduce the noisiness of supervisor logs. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rfarivar/storm STORM-937

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #631


commit 2d030413ab651140a3dd3655673b7bb81c1ce202
Author: rfarivar rfari...@yahoo-inc.com
Date:   2015-07-14T16:51:59Z

Changing the log level from info to debug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Storm 763/839 0.10.x

2015-07-14 Thread knusbaum
Github user knusbaum commented on a diff in the pull request:

https://github.com/apache/storm/pull/617#discussion_r34599960
  
--- Diff: storm-core/src/jvm/backtype/storm/messaging/netty/Client.java ---
@@ -59,20 +59,16 @@
  * - Connecting and reconnecting are performed asynchronously.
  * - Note: The current implementation drops any messages that are 
being enqueued for sending if the connection to
  *   the remote destination is currently unavailable.
- * - A background flusher thread is run in the background.  It will, at 
fixed intervals, check for any pending messages
- *   (i.e. messages buffered in memory) and flush them to the remote 
destination iff background flushing is currently
- *   enabled.
  */
 public class Client extends ConnectionWithStatus implements 
IStatefulObject {
+private static final long PENDING_MESSAGES_FLUSH_TIMEOUT_MS = 60L;
--- End diff --

This seems like an incredibly large timeout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Storm 763/839 0.10.x

2015-07-14 Thread eshioji
Github user eshioji commented on a diff in the pull request:

https://github.com/apache/storm/pull/617#discussion_r34602607
  
--- Diff: storm-core/src/jvm/backtype/storm/messaging/netty/Client.java ---
@@ -59,20 +59,16 @@
  * - Connecting and reconnecting are performed asynchronously.
  * - Note: The current implementation drops any messages that are 
being enqueued for sending if the connection to
  *   the remote destination is currently unavailable.
- * - A background flusher thread is run in the background.  It will, at 
fixed intervals, check for any pending messages
- *   (i.e. messages buffered in memory) and flush them to the remote 
destination iff background flushing is currently
- *   enabled.
  */
 public class Client extends ConnectionWithStatus implements 
IStatefulObject {
+private static final long PENDING_MESSAGES_FLUSH_TIMEOUT_MS = 60L;
--- End diff --

Maybe, this value was inherited from the current code tho. Maybe @miguno 
can shed light on the rationale?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (STORM-763) nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A

2015-07-14 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated STORM-763:
--
Assignee: Enno Shioji

 nimbus reassigned worker A to another machine, but other worker's netty 
 client can't connect to the new worker A 
 -

 Key: STORM-763
 URL: https://issues.apache.org/jira/browse/STORM-763
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.4
 Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
 java version 1.7.0_03
 storm 0.9.4
 cluster 50+ machines
Reporter: 3in
Assignee: Enno Shioji

 Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
 java version 1.7.0_03
 storm 0.9.4
 cluster 50+ machines
 my topology have 50+ worker, it can't emit  5 thousand tuples in ten 
 minutes.
 sometimes one worker is reassigned to another machine by nimbus because of 
 task heartbeat timeout:
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[440 440] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[90 90] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[510 510] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[160 160] not alive
 i can see the reassigned worker is already started in storm UI,  but  other 
 worker write error log all the time:
 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 The worker of destined host is already started, and i can telnet 
 192.168.163.19 5700.
 however, why the netty client can't connect to the ip:port?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-763) nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A

2015-07-14 Thread Enno Shioji (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626944#comment-14626944
 ] 

Enno Shioji commented on STORM-763:
---

[~revans2] Yay, thank you! I'll ping [~ptgoetz] on the dev. mailing list.

 nimbus reassigned worker A to another machine, but other worker's netty 
 client can't connect to the new worker A 
 -

 Key: STORM-763
 URL: https://issues.apache.org/jira/browse/STORM-763
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.4
 Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
 java version 1.7.0_03
 storm 0.9.4
 cluster 50+ machines
Reporter: 3in
Assignee: Enno Shioji
 Fix For: 0.9.6


 Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
 java version 1.7.0_03
 storm 0.9.4
 cluster 50+ machines
 my topology have 50+ worker, it can't emit  5 thousand tuples in ten 
 minutes.
 sometimes one worker is reassigned to another machine by nimbus because of 
 task heartbeat timeout:
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[440 440] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[90 90] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[510 510] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[160 160] not alive
 i can see the reassigned worker is already started in storm UI,  but  other 
 worker write error log all the time:
 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 The worker of destined host is already started, and i can telnet 
 192.168.163.19 5700.
 however, why the netty client can't connect to the ip:port?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.9.6 release for STORM-763/839?

2015-07-14 Thread 임정택
In addition to Enno's mail, I'd like to know we'd like to maintain three
version lines (currently 0.9.x, 0.10.x, 0.11.x) continuously, or it is just
a period of transition.

I'm maintaining three version lines (bugfix, next minor, next major - with
respecting semver) from other project, and sometimes maintaining it is
really annoying, though most of the issues are breaking changes of next
major and current.

Since stable version of Storm is 0.9.5, and some users want to upgrade
Storm with minimized impact so releasing 0.9.6 could make sense for now.
(It may be better to collect some bugfixes and backport before releasing
0.9.6 since other version lines already has many bugfixes which can be
applied to 0.9.x.)

But I also think we should make an effort to make 0.10.0 stable and release
official version.
Storm 0.10.0-beta was released one month ago, and we don't have a plan to
release next beta, or stable version.
After releasing 0.10.0 we can choose whether we maintain 0.9.x lines or not.

If it's possible to release stable version of 0.10.0 faster, for example,
before end of this month, then we can delegate applying any bugfix issues
(except critical / blocker) to 0.10.0.

tl;dr.
We can make an effort to 0.10.0 to make it stable, and release official
version faster.
If stable version of 0.10.0 could be released shortly, I would not want to
release any new 0.9.x versions except any critical or blocker bugs.

Thanks,
Jungtaek Lim (HeartSaVioR)


2015-07-15 5:04 GMT+09:00 Enno Shioji eshi...@gmail.com:

 Hi Taylor,


 @Bobby recommended I talk to you about a potential 0.9.6 release to fix
 STORM-763/839. The fixes are already merged to master, 0.10.x and 0.9.x.

 In a nutshell, they fix the following symptoms -- but the catch is that
 these symptoms only surface when connections between bolts are lost
 frequently. So I'm not sure whether it warrants a release. The symptoms
 are:
  - Thread deadlock hazard in Netty Client (STORM-839)
  - Slow reconnection (STORM-763)
  - Verbose error log (STORM-763)

 Anyways, just thought I'll ping you. Btw thanks for your work, we are using
 Storm extensively and it's been amazing!


 Enno




-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior


[GitHub] storm pull request: [STORM-937] Changing the log level from info t...

2015-07-14 Thread nathanmarz
Github user nathanmarz commented on the pull request:

https://github.com/apache/storm/pull/631#issuecomment-121403303
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627113#comment-14627113
 ] 

ASF GitHub Bot commented on STORM-937:
--

Github user nathanmarz commented on the pull request:

https://github.com/apache/storm/pull/631#issuecomment-121403303
  
+1


 StormBoundedExponentialBackoffRetry too noisy, lower log level
 --

 Key: STORM-937
 URL: https://issues.apache.org/jira/browse/STORM-937
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Reza Farivar
Assignee: Reza Farivar
Priority: Minor

 The supervisor logs are currently overpopulated with log messages similar to 
 this: 
 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The 
 baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5]
 The log level in the StormBoundedExponentialBackoffRetry is currently at info 
 level. It seems it can be safely lowered to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627098#comment-14627098
 ] 

ASF GitHub Bot commented on STORM-937:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/631#issuecomment-121400695
  
+1


 StormBoundedExponentialBackoffRetry too noisy, lower log level
 --

 Key: STORM-937
 URL: https://issues.apache.org/jira/browse/STORM-937
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Reza Farivar
Assignee: Reza Farivar
Priority: Minor

 The supervisor logs are currently overpopulated with log messages similar to 
 this: 
 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The 
 baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5]
 The log level in the StormBoundedExponentialBackoffRetry is currently at info 
 level. It seems it can be safely lowered to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: [STORM-937] Changing the log level from info t...

2015-07-14 Thread knusbaum
Github user knusbaum commented on the pull request:

https://github.com/apache/storm/pull/631#issuecomment-121365995
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


0.9.6 release for STORM-763/839?

2015-07-14 Thread Enno Shioji
Hi Taylor,


@Bobby recommended I talk to you about a potential 0.9.6 release to fix
STORM-763/839. The fixes are already merged to master, 0.10.x and 0.9.x.

In a nutshell, they fix the following symptoms -- but the catch is that
these symptoms only surface when connections between bolts are lost
frequently. So I'm not sure whether it warrants a release. The symptoms are:
 - Thread deadlock hazard in Netty Client (STORM-839)
 - Slow reconnection (STORM-763)
 - Verbose error log (STORM-763)

Anyways, just thought I'll ping you. Btw thanks for your work, we are using
Storm extensively and it's been amazing!


Enno


[jira] [Resolved] (STORM-839) Deadlock hazard in backtype.storm.messaging.netty.Client

2015-07-14 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-839.
---
   Resolution: Fixed
 Assignee: Enno Shioji
Fix Version/s: 0.9.6

Thanks [~eshioji],

I merged this into master, branch-0.10.x and branch-0.9.x.  You may want to 
talk to [~ptgoetz] about doing a 0.9.6 release.

 Deadlock hazard in backtype.storm.messaging.netty.Client
 

 Key: STORM-839
 URL: https://issues.apache.org/jira/browse/STORM-839
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.4
Reporter: Enno Shioji
Assignee: Enno Shioji
Priority: Critical
 Fix For: 0.9.6


 See the thread dump below that shows the deadlock. client-worker-1 is holding 
 7b5a7fa5 and waiting on 1446a1e9. Thread-10-disruptor-worker-transfer-queue 
 is holding 1446a1e9 and is waiting on 7b5a7fa5.
 (Thread dump is truncated to show only the relevant parts)
 2015-05-28 15:37:15
 Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.72-b04 mixed mode):
 Thread-10-disruptor-worker-transfer-queue - Thread t@52
java.lang.Thread.State: BLOCKED
   at 
 org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:398)
   - waiting to lock 7b5a7fa5 (a java.lang.Object) owned by 
 client-worker-1 t@25
   at 
 org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
   at 
 org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
   at 
 org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
   at org.apache.storm.netty.channel.Channels.write(Channels.java:725)
   at 
 org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
   at 
 org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
   at 
 org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
   at 
 org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
   at org.apache.storm.netty.channel.Channels.write(Channels.java:704)
   at org.apache.storm.netty.channel.Channels.write(Channels.java:671)
   at 
 org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
   at backtype.storm.messaging.netty.Client.flushMessages(Client.java:480)
   - locked 1446a1e9 (a backtype.storm.messaging.netty.Client)
   at backtype.storm.messaging.netty.Client.send(Client.java:412)
   - locked 1446a1e9 (a backtype.storm.messaging.netty.Client)
   at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
   at 
 backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5014$fn__5015.invoke(worker.clj:334)
   at 
 backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5014.invoke(worker.clj:332)
   at 
 backtype.storm.disruptor$clojure_handler$reify__1446.onEvent(disruptor.clj:58)
   at 
 backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
   at 
 backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
   at 
 backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
   at 
 backtype.storm.disruptor$consume_loop_STAR_$fn__1459.invoke(disruptor.clj:94)
   at backtype.storm.util$async_loop$fn__458.invoke(util.clj:463)
   at clojure.lang.AFn.run(AFn.java:24)
   at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
   - None
 client-worker-1 - Thread t@25
java.lang.Thread.State: BLOCKED
   at 
 backtype.storm.messaging.netty.Client.closeChannelAndReconnect(Client.java:501)
   - waiting to lock 1446a1e9 (a backtype.storm.messaging.netty.Client) 
 owned by Thread-10-disruptor-worker-transfer-queue t@52
   at backtype.storm.messaging.netty.Client.access$1400(Client.java:78)
   at 
 backtype.storm.messaging.netty.Client$3.operationComplete(Client.java:492)
   at 
 org.apache.storm.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
   at 
 org.apache.storm.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413)
   at 
 org.apache.storm.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
   at 
 org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:437)
   - locked 7b5a7fa5 (a java.lang.Object)
   at 
 org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373)
   at 
 

[jira] [Updated] (STORM-615) Add REST API to upload topology

2015-07-14 Thread Jungtaek Lim (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-615:
---
Fix Version/s: (was: 0.10.0)

 Add REST API to upload topology
 ---

 Key: STORM-615
 URL: https://issues.apache.org/jira/browse/STORM-615
 Project: Apache Storm
  Issue Type: Bug
Reporter: Sriharsha Chintalapani
Assignee: Arun Mahadevan

 Add REST api /api/v1/submitTopology to upload topology jars and config using 
 REST api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627459#comment-14627459
 ] 

ASF GitHub Bot commented on STORM-643:
--

Github user vesense commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121473865
  
@HeartSaVioR @tedxia This PR is for branch 0.9.x. 0.9.x is very different 
from 0.10.x and master, so when we go the cherry-pick there are a lot of 
conflicts. In branch 0.10.0, the failed tuple has been managered by 
ExponentialBackoffMsgRetryManager, so we should test and verify whether we can 
reproduce the issue in 0.10.0.



 KafkaUtils repeatedly fetches messages whose offset is out of range
 ---

 Key: STORM-643
 URL: https://issues.apache.org/jira/browse/STORM-643
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka
Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5
Reporter: Xin Wang
Assignee: Xin Wang
Priority: Minor

 KafkaUtils repeat fetch messages which offset is out of range.
 This happened when failed list(SortedSetLong failed) is not empty and some 
 offset in it is OutOfRange.
 {code}
 [worker-log]
 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 ...
 {code}
 [FIX]
 {code}
 storm.kafka.PartitionManager.fill():
 ...
 try {
   msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, 
 offset);
 } catch (UpdateOffsetException e) {
_emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, 
 _partition.partition, _spoutConfig);
   LOG.warn(Using new offset: {}, _emittedToOffset);
   // fetch failed, so don't update the metrics
   //fix bug: remove this offset from failed list when it is OutOfRange
   if (had_failed) {
   failed.remove(offset);
   }
 return;
 }
 ...
 {code}
 also: Log retrying with default start offset time from configuration. 
 configured start offset time: [-2] is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...

2015-07-14 Thread vesense
Github user vesense commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121473865
  
@HeartSaVioR @tedxia This PR is for branch 0.9.x. 0.9.x is very different 
from 0.10.x and master, so when we go the cherry-pick there are a lot of 
conflicts. In branch 0.10.0, the failed tuple has been managered by 
ExponentialBackoffMsgRetryManager, so we should test and verify whether we can 
reproduce the issue in 0.10.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: A question about setup-default-uncaught-exception-handler function

2015-07-14 Thread Chuanlei Ni
Ok, understand. Thank you very much.

And sorry for the halfheartedness while reading the code!

Thank you again!

2015-07-15 1:17 GMT+08:00 Kishorkumar Patil kpa...@yahoo-inc.com.invalid:

 Hi Chuanlei,
 The setup-default-uncaught-exception-handler fails fast in both OOM and
 other uncaught Exceptions. The difference in treatment is Runtime.halt vs
 Runtime.exist (Other uncaught exceptions handled here).

 If the OOM Error, we shutdown JVM using Runtime.halt. In other cases, we
 call Runtime.exit,  which invokes all registered shutdownhooks, giving
 other parts a chance to gracefully finalize.

 Calling Runtime.halt, is extreme caution - as it shutdowns the system
 without calling any shutdownHooks. This extreme steps is essential for OOM
 as attempt to handle that itselt can rethrow more of OOMs.
 So to answer your question in short, we are failing fast - running other
 shutdownhooks or not is the only difference.

 -Kishor







  On Tuesday, July 14, 2015 10:40 AM, Chuanlei Ni nichuan...@gmail.com
 wrote:


  Hi,
   I want to know why setup-default-uncaught-exception-handler just deal
 with the OOM error, since the fast fail is the philosophy of Storm design.
 When a thread crashes in one Storm process, the process will lost its
 functionality mostly.
 Why not exit the whole process when an uncaught exception happens? If we
 deal exception in that way, we can remove a lot of labor for ops of storm.

 Thanks in advance!






Re: 0.9.6 release for STORM-763/839?

2015-07-14 Thread P. Taylor Goetz
My opinion:

In terms of supporting the 0.9.x release line, I think that’s critical, at the 
very least until 0.10.0 stable is released. More likely for some time 
thereafter for those who don’t want to (or can’t) upgrade.

Updates to the the 0.9.x line should be limited to bug fixes.

I’d like to release a 0.10.0-beta2 or 0.10.0 soon. I haven’t seen much in terms 
of end user feedback on the beta. A few user/dev success stories would make me 
feel better about dropping the beta tag.

As always, I’m open to any and all opinions.

-Taylor


 On Jul 14, 2015, at 5:43 PM, 임정택 kabh...@gmail.com wrote:
 
 In addition to Enno's mail, I'd like to know we'd like to maintain three
 version lines (currently 0.9.x, 0.10.x, 0.11.x) continuously, or it is just
 a period of transition.
 
 I'm maintaining three version lines (bugfix, next minor, next major - with
 respecting semver) from other project, and sometimes maintaining it is
 really annoying, though most of the issues are breaking changes of next
 major and current.
 
 Since stable version of Storm is 0.9.5, and some users want to upgrade
 Storm with minimized impact so releasing 0.9.6 could make sense for now.
 (It may be better to collect some bugfixes and backport before releasing
 0.9.6 since other version lines already has many bugfixes which can be
 applied to 0.9.x.)
 
 But I also think we should make an effort to make 0.10.0 stable and release
 official version.
 Storm 0.10.0-beta was released one month ago, and we don't have a plan to
 release next beta, or stable version.
 After releasing 0.10.0 we can choose whether we maintain 0.9.x lines or not.
 
 If it's possible to release stable version of 0.10.0 faster, for example,
 before end of this month, then we can delegate applying any bugfix issues
 (except critical / blocker) to 0.10.0.
 
 tl;dr.
 We can make an effort to 0.10.0 to make it stable, and release official
 version faster.
 If stable version of 0.10.0 could be released shortly, I would not want to
 release any new 0.9.x versions except any critical or blocker bugs.
 
 Thanks,
 Jungtaek Lim (HeartSaVioR)
 
 
 2015-07-15 5:04 GMT+09:00 Enno Shioji eshi...@gmail.com:
 
 Hi Taylor,
 
 
 @Bobby recommended I talk to you about a potential 0.9.6 release to fix
 STORM-763/839. The fixes are already merged to master, 0.10.x and 0.9.x.
 
 In a nutshell, they fix the following symptoms -- but the catch is that
 these symptoms only surface when connections between bolts are lost
 frequently. So I'm not sure whether it warrants a release. The symptoms
 are:
 - Thread deadlock hazard in Netty Client (STORM-839)
 - Slow reconnection (STORM-763)
 - Verbose error log (STORM-763)
 
 Anyways, just thought I'll ping you. Btw thanks for your work, we are using
 Storm extensively and it's been amazing!
 
 
 Enno
 
 
 
 
 --
 Name : 임 정택
 Blog : http://www.heartsavior.net / http://dev.heartsavior.net
 Twitter : http://twitter.com/heartsavior
 LinkedIn : http://www.linkedin.com/in/heartsavior



signature.asc
Description: Message signed with OpenPGP using GPGMail


[GitHub] storm pull request: [STORM-643] KafkaUtils repeat fetch messages w...

2015-07-14 Thread vesense
Github user vesense commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121473131
  
@miguno Sorry I took so long to respond. Like @tpiscitell already 
explained, this problem are not solved by STORM-586 and STORM-511.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-643) KafkaUtils repeatedly fetches messages whose offset is out of range

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627456#comment-14627456
 ] 

ASF GitHub Bot commented on STORM-643:
--

Github user vesense commented on the pull request:

https://github.com/apache/storm/pull/405#issuecomment-121473131
  
@miguno Sorry I took so long to respond. Like @tpiscitell already 
explained, this problem are not solved by STORM-586 and STORM-511.



 KafkaUtils repeatedly fetches messages whose offset is out of range
 ---

 Key: STORM-643
 URL: https://issues.apache.org/jira/browse/STORM-643
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka
Affects Versions: 0.9.2-incubating, 0.9.3, 0.9.4, 0.9.5
Reporter: Xin Wang
Assignee: Xin Wang
Priority: Minor

 KafkaUtils repeat fetch messages which offset is out of range.
 This happened when failed list(SortedSetLong failed) is not empty and some 
 offset in it is OutOfRange.
 {code}
 [worker-log]
 2015-02-01 10:24:27.231+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.232+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 2015-02-01 10:24:27.333+0800 s.k.KafkaUtils [WARN] Got fetch request with 
 offset out of range: [20919071816]; retrying with default start offset time 
 from configuration. configured start offset time: [-2]
 2015-02-01 10:24:27.334+0800 s.k.PartitionManager [WARN] Using new offset: 
 20996130717
 ...
 {code}
 [FIX]
 {code}
 storm.kafka.PartitionManager.fill():
 ...
 try {
   msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, 
 offset);
 } catch (UpdateOffsetException e) {
_emittedToOffset = KafkaUtils.getOffset(_consumer, _spoutConfig.topic, 
 _partition.partition, _spoutConfig);
   LOG.warn(Using new offset: {}, _emittedToOffset);
   // fetch failed, so don't update the metrics
   //fix bug: remove this offset from failed list when it is OutOfRange
   if (had_failed) {
   failed.remove(offset);
   }
 return;
 }
 ...
 {code}
 also: Log retrying with default start offset time from configuration. 
 configured start offset time: [-2] is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (STORM-918) Storm CLI could validate arguments/print usage

2015-07-14 Thread Shyam Rajendran (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Rajendran reassigned STORM-918:
-

Assignee: Shyam Rajendran

 Storm CLI could validate arguments/print usage
 --

 Key: STORM-918
 URL: https://issues.apache.org/jira/browse/STORM-918
 Project: Apache Storm
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Derek Dagit
Assignee: Shyam Rajendran
Priority: Minor
  Labels: Newbie

 It would be nice if the storm CLI printed usage information if arguments are 
 missing.
 For example, when omitting the argument to the kill sub-command, a JVM is 
 launched and an exception complaining that a topology named 'nil' is not 
 alive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-918) Storm CLI could validate arguments/print usage

2015-07-14 Thread Shyam Rajendran (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Rajendran updated STORM-918:
--
Assignee: (was: Shyam Rajendran)

 Storm CLI could validate arguments/print usage
 --

 Key: STORM-918
 URL: https://issues.apache.org/jira/browse/STORM-918
 Project: Apache Storm
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Derek Dagit
Priority: Minor
  Labels: Newbie

 It would be nice if the storm CLI printed usage information if arguments are 
 missing.
 For example, when omitting the argument to the kill sub-command, a JVM is 
 launched and an exception complaining that a topology named 'nil' is not 
 alive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (STORM-869) kafka spout cannot fetch message if log size is above fetchSizeBytes

2015-07-14 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani closed STORM-869.

Resolution: Invalid

 kafka spout cannot fetch message if log size is above fetchSizeBytes
 

 Key: STORM-869
 URL: https://issues.apache.org/jira/browse/STORM-869
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka
Affects Versions: 0.9.5
Reporter: Adrian Seungjin Lee
Priority: Critical

 let's say maxFetchSizeBytes is set to 1 megabytes, then if there exists a 
 message that is bigger than 1 m, kafka spout just hangs and become inactive.
 This is both happening in Kafka spout for bolt/spout topology and also in 
 trident spouts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-869) kafka spout cannot fetch message if log size is above fetchSizeBytes

2015-07-14 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627501#comment-14627501
 ] 

Sriharsha Chintalapani commented on STORM-869:
--

[~sweetest_sj] this is not a issue with kafka spout . Its how the kafka api is 
your fetch.max.message.bytes should always be higher than the max message size 
. 
you can check broker's server.properties message.max.bytes and make sure you 
set fetch.max.message.bytes to equal or higher.

 kafka spout cannot fetch message if log size is above fetchSizeBytes
 

 Key: STORM-869
 URL: https://issues.apache.org/jira/browse/STORM-869
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka
Affects Versions: 0.9.5
Reporter: Adrian Seungjin Lee
Priority: Critical

 let's say maxFetchSizeBytes is set to 1 megabytes, then if there exists a 
 message that is bigger than 1 m, kafka spout just hangs and become inactive.
 This is both happening in Kafka spout for bolt/spout topology and also in 
 trident spouts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: STORM-918 Storm CLI could validate arguments/p...

2015-07-14 Thread bourneagain
GitHub user bourneagain opened a pull request:

https://github.com/apache/storm/pull/632

STORM-918 Storm CLI could validate arguments/print usage

Storm commands that mandate proper args to be passed would now throw
the function's doc string as help rather than exiting out with error after 
creating the JVM
process.

Example : 

./storm kill
Syntax: [storm kill topology-name [-w wait-time-secs]]

Kills the topology with the name topology-name. Storm will
first deactivate the topology's spouts for the duration of
the topology's message timeout to allow all messages currently
being processed to finish processing. Storm will then shutdown
the workers and clean up their state. You can override the length
of time Storm waits between deactivation and shutdown with the -w flag.

./storm get-errors
Syntax: [storm get-errors topology-name]

Get the latest error from the running topology. The returned result 
contains
the key value pairs for component-name and component-error for the 
components in error.
The result is returned in json format.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bourneagain/storm STORM-918

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #632


commit fc09bf07afca95a82cbdbb4f10d47c56bdff3918
Author: Shyam Rajendran rshyam@gmail.com
Date:   2015-07-15T05:42:41Z

STORM-918 Storm CLI could validate arguments/print usage
Storm commands that mandate proper args to be passed would not thrown
the function doc string rather than errorring out after creating the JVM
process.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-742) Very busy ShellBolt subprocess with ACK mode cannot respond heartbeat just in time

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627496#comment-14627496
 ] 

ASF GitHub Bot commented on STORM-742:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/497#issuecomment-121484496
  
@dashengju Could you share your patch (limiting queue size) if you don't 
mind?


 Very busy ShellBolt subprocess with ACK mode cannot respond heartbeat just in 
 time
 --

 Key: STORM-742
 URL: https://issues.apache.org/jira/browse/STORM-742
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.3, 0.10.0, 0.9.4, 0.11.0
Reporter: Jungtaek Lim
Assignee: Jungtaek Lim
Priority: Critical

 As [~dashengju] stated from STORM-738, very busy ShellBolt subprocess cannot 
 respond heartbeat just in time.
 Actually it's by design constraint (more details are on STORM-513 or 
 STORM-738), but ShellSpout avoids constraint by updating heartbeat at any 
 type of response from subprocess.
 We can apply this approach to ShellBolt and let ShellBolt avoid design 
 constraint, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (STORM-837) HdfsState ignores commits

2015-07-14 Thread Arun Mahadevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Mahadevan reassigned STORM-837:


Assignee: Arun Mahadevan

 HdfsState ignores commits
 -

 Key: STORM-837
 URL: https://issues.apache.org/jira/browse/STORM-837
 Project: Apache Storm
  Issue Type: Bug
Reporter: Robert Joseph Evans
Assignee: Arun Mahadevan
Priority: Critical

 HdfsState works with trident which is supposed to provide exactly once 
 processing.  It does this two ways, first by informing the state about 
 commits so it can be sure the data is written out, and second by having a 
 commit id, so that double commits can be handled.
 HdfsState ignores the beginCommit and commit calls, and with that ignores the 
 ids.  This means that if you use HdfsState and your worker crashes you may 
 both lose data and get some data twice.
 At a minimum the flush and file rotation should be tied to the commit in some 
 way.  The commit ID should at a minimum be written out with the data so 
 someone reading the data can have a hope of deduping it themselves.
 Also with the rotationActions it is possible for a file that was partially 
 written is leaked, and never moved to the final location, because it is not 
 rotated.  I personally think the actions are too generic for this case and 
 need to be deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (STORM-918) Storm CLI could validate arguments/print usage

2015-07-14 Thread Shyam Rajendran (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Rajendran reassigned STORM-918:
-

Assignee: Shyam Rajendran

 Storm CLI could validate arguments/print usage
 --

 Key: STORM-918
 URL: https://issues.apache.org/jira/browse/STORM-918
 Project: Apache Storm
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Derek Dagit
Assignee: Shyam Rajendran
Priority: Minor
  Labels: Newbie

 It would be nice if the storm CLI printed usage information if arguments are 
 missing.
 For example, when omitting the argument to the kill sub-command, a JVM is 
 launched and an exception complaining that a topology named 'nil' is not 
 alive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


0.9.6 release for STORM-763/839?

2015-07-14 Thread 임정택
To be more clear,  I'm fine with maintaining three minor version lines
while it is period of transition.
Just a thing I'm curious is we'd like to maintain two stable version lines
continuously (with one unstable).

For now, I'm fine with releasing 0.9.6 with backporting bugfixes which are
already committed to master but not backported.
(Same things applied to 0.10.0)

Btw, 0.10.0 has its own issue, STORM-903
https://issues.apache.org/jira/browse/STORM-903, which is discussed but
not resolved. We may want to resolve it before next release.

Best,
Jungtaek Lim (HeartSaVioR)

2015-07-15 11:02 GMT+09:00 P. Taylor Goetz ptgo...@gmail.com
javascript:_e(%7B%7D,'cvml','ptgo...@gmail.com');:

 My opinion:

 In terms of supporting the 0.9.x release line, I think that’s critical, at
 the very least until 0.10.0 stable is released. More likely for some time
 thereafter for those who don’t want to (or can’t) upgrade.

 Updates to the the 0.9.x line should be limited to bug fixes.

 I’d like to release a 0.10.0-beta2 or 0.10.0 soon. I haven’t seen much in
 terms of end user feedback on the beta. A few user/dev success stories
 would make me feel better about dropping the beta tag.

 As always, I’m open to any and all opinions.

 -Taylor


  On Jul 14, 2015, at 5:43 PM, 임정택 kabh...@gmail.com
 javascript:_e(%7B%7D,'cvml','kabh...@gmail.com'); wrote:
 
  In addition to Enno's mail, I'd like to know we'd like to maintain three
  version lines (currently 0.9.x, 0.10.x, 0.11.x) continuously, or it is
 just
  a period of transition.
 
  I'm maintaining three version lines (bugfix, next minor, next major -
 with
  respecting semver) from other project, and sometimes maintaining it is
  really annoying, though most of the issues are breaking changes of next
  major and current.
 
  Since stable version of Storm is 0.9.5, and some users want to upgrade
  Storm with minimized impact so releasing 0.9.6 could make sense for now.
  (It may be better to collect some bugfixes and backport before releasing
  0.9.6 since other version lines already has many bugfixes which can be
  applied to 0.9.x.)
 
  But I also think we should make an effort to make 0.10.0 stable and
 release
  official version.
  Storm 0.10.0-beta was released one month ago, and we don't have a plan to
  release next beta, or stable version.
  After releasing 0.10.0 we can choose whether we maintain 0.9.x lines or
 not.
 
  If it's possible to release stable version of 0.10.0 faster, for example,
  before end of this month, then we can delegate applying any bugfix issues
  (except critical / blocker) to 0.10.0.
 
  tl;dr.
  We can make an effort to 0.10.0 to make it stable, and release official
  version faster.
  If stable version of 0.10.0 could be released shortly, I would not want
 to
  release any new 0.9.x versions except any critical or blocker bugs.
 
  Thanks,
  Jungtaek Lim (HeartSaVioR)
 
 
  2015-07-15 5:04 GMT+09:00 Enno Shioji eshi...@gmail.com
 javascript:_e(%7B%7D,'cvml','eshi...@gmail.com');:
 
  Hi Taylor,
 
 
  @Bobby recommended I talk to you about a potential 0.9.6 release to fix
  STORM-763/839. The fixes are already merged to master, 0.10.x and 0.9.x.
 
  In a nutshell, they fix the following symptoms -- but the catch is that
  these symptoms only surface when connections between bolts are lost
  frequently. So I'm not sure whether it warrants a release. The symptoms
  are:
  - Thread deadlock hazard in Netty Client (STORM-839)
  - Slow reconnection (STORM-763)
  - Verbose error log (STORM-763)
 
  Anyways, just thought I'll ping you. Btw thanks for your work, we are
 using
  Storm extensively and it's been amazing!
 
 
  Enno
 
 
 
 
  --
  Name : 임 정택
  Blog : http://www.heartsavior.net / http://dev.heartsavior.net
  Twitter : http://twitter.com/heartsavior
  LinkedIn : http://www.linkedin.com/in/heartsavior




-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior


-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior


[GitHub] storm pull request: STORM-742 Let ShellBolt treat all messages to ...

2015-07-14 Thread HeartSaVioR
Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/497#issuecomment-121484496
  
@dashengju Could you share your patch (limiting queue size) if you don't 
mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Storm 763/839 0.10.x

2015-07-14 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/617#discussion_r34603263
  
--- Diff: storm-core/src/jvm/backtype/storm/messaging/netty/Client.java ---
@@ -59,20 +59,16 @@
  * - Connecting and reconnecting are performed asynchronously.
  * - Note: The current implementation drops any messages that are 
being enqueued for sending if the connection to
  *   the remote destination is currently unavailable.
- * - A background flusher thread is run in the background.  It will, at 
fixed intervals, check for any pending messages
- *   (i.e. messages buffered in memory) and flush them to the remote 
destination iff background flushing is currently
- *   enabled.
  */
 public class Client extends ConnectionWithStatus implements 
IStatefulObject {
+private static final long PENDING_MESSAGES_FLUSH_TIMEOUT_MS = 60L;
--- End diff --

It is large, but it matches the code that was there before.  This is only 
used on a close, as a timeout when trying to be sure that we have sent all 
pending messages.  We should look at potentially lowering it, but I think that 
is for a different JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Storm 763/839 0.11.x

2015-07-14 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/616#issuecomment-121338740
  
+1 the code compiles and the tests all pass


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-763] nimbus reassigned worker A to anot...

2015-07-14 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/568#issuecomment-121338790
  
+1 the code compiles and the tests all pass


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Storm 763/839 0.10.x

2015-07-14 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/617#issuecomment-121338762
  
+1 the code compiles and the tests all pass


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-763) nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626850#comment-14626850
 ] 

ASF GitHub Bot commented on STORM-763:
--

Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/568#issuecomment-121338790
  
+1 the code compiles and the tests all pass


 nimbus reassigned worker A to another machine, but other worker's netty 
 client can't connect to the new worker A 
 -

 Key: STORM-763
 URL: https://issues.apache.org/jira/browse/STORM-763
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.4
 Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
 java version 1.7.0_03
 storm 0.9.4
 cluster 50+ machines
Reporter: 3in

 Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
 java version 1.7.0_03
 storm 0.9.4
 cluster 50+ machines
 my topology have 50+ worker, it can't emit  5 thousand tuples in ten 
 minutes.
 sometimes one worker is reassigned to another machine by nimbus because of 
 task heartbeat timeout:
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[440 440] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[90 90] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[510 510] not alive
 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
 my_topology-22-1428243953:[160 160] not alive
 i can see the reassigned worker is already started in storm UI,  but  other 
 worker write error log all the time:
 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
 destined for Netty-Client-host_19/192.168.163.19:5700
 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to 
 Netty-Client-host_19/192.168.163.19:5700 is unavailable
 The worker of destined host is already started, and i can telnet 
 192.168.163.19 5700.
 however, why the netty client can't connect to the ip:port?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: [STORM-763] nimbus reassigned worker A to anot...

2015-07-14 Thread knusbaum
Github user knusbaum commented on the pull request:

https://github.com/apache/storm/pull/568#issuecomment-121344016
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Storm 763/839 0.11.x

2015-07-14 Thread knusbaum
Github user knusbaum commented on the pull request:

https://github.com/apache/storm/pull/616#issuecomment-121343845
  
+1 tests pass, examples run.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Storm 763/839 0.10.x

2015-07-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/617


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: Storm 763/839 0.11.x

2015-07-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/616


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-839] Deadlock hazard in backtype.storm....

2015-07-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/566


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (STORM-937) StormBoundedExponentialBackoffRetry too noisy, lower log level

2015-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626756#comment-14626756
 ] 

ASF GitHub Bot commented on STORM-937:
--

Github user d2r commented on a diff in the pull request:

https://github.com/apache/storm/pull/631#discussion_r34597439
  
--- Diff: 
storm-core/src/jvm/backtype/storm/utils/StormBoundedExponentialBackoffRetry.java
 ---
@@ -44,7 +44,7 @@ public StormBoundedExponentialBackoffRetry(int 
baseSleepTimeMs, int maxSleepTime
 expRetriesThreshold = 1;
 while ((1  (expRetriesThreshold + 1))  ((maxSleepTimeMs - 
baseSleepTimeMs) / 2))
 expRetriesThreshold++;
-LOG.info(The baseSleepTimeMs [ + baseSleepTimeMs + ] the 
maxSleepTimeMs [ + maxSleepTimeMs + ]  +
+LOG.debug(The baseSleepTimeMs [ + baseSleepTimeMs + ] the 
maxSleepTimeMs [ + maxSleepTimeMs + ]  +
--- End diff --

It would be nice to use the parameterized call, so that Strings are not 
constructed unnecessarily:

```Java
Log.debug(The baseSleepTimeMs [{}] ... , baseSleepTimeMs, ...);
```


 StormBoundedExponentialBackoffRetry too noisy, lower log level
 --

 Key: STORM-937
 URL: https://issues.apache.org/jira/browse/STORM-937
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Reza Farivar
Assignee: Reza Farivar
Priority: Minor

 The supervisor logs are currently overpopulated with log messages similar to 
 this: 
 2015-07-10 18:12:06.723 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The 
 baseSleepTimeMs [2000] the maxSleepTimeMs [6] the maxRetries [5]
 The log level in the StormBoundedExponentialBackoffRetry is currently at info 
 level. It seems it can be safely lowered to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] storm pull request: [STORM-937] Changing the log level from info t...

2015-07-14 Thread d2r
Github user d2r commented on a diff in the pull request:

https://github.com/apache/storm/pull/631#discussion_r34597439
  
--- Diff: 
storm-core/src/jvm/backtype/storm/utils/StormBoundedExponentialBackoffRetry.java
 ---
@@ -44,7 +44,7 @@ public StormBoundedExponentialBackoffRetry(int 
baseSleepTimeMs, int maxSleepTime
 expRetriesThreshold = 1;
 while ((1  (expRetriesThreshold + 1))  ((maxSleepTimeMs - 
baseSleepTimeMs) / 2))
 expRetriesThreshold++;
-LOG.info(The baseSleepTimeMs [ + baseSleepTimeMs + ] the 
maxSleepTimeMs [ + maxSleepTimeMs + ]  +
+LOG.debug(The baseSleepTimeMs [ + baseSleepTimeMs + ] the 
maxSleepTimeMs [ + maxSleepTimeMs + ]  +
--- End diff --

It would be nice to use the parameterized call, so that Strings are not 
constructed unnecessarily:

```Java
Log.debug(The baseSleepTimeMs [{}] ... , baseSleepTimeMs, ...);
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---