[ANNOUNCE] Apache Storm 2.1.1 Released

2021-10-13 Thread Ethan Li
The Apache Storm community is pleased to announce the release of
Apache Storm version 2.1.1.

Apache Storm is a distributed, fault-tolerant, and high-performance
realtime computation system that provides strong guarantees on the
processing of data. You can read more about Apache Storm on the
project website:

https://storm.apache.org/

Downloads of source and binary distributions are listed in our download section:

https://storm.apache.org/downloads.html

You can read more about this release in the following blog post:

https://storm.apache.org/2021/10/14/storm211-released.html

Distribution artifacts are available in Maven Central at the following
coordinates:

groupId: org.apache.storm
artifactId: storm-{component}
version: 2.1.1

The full list of changes is available here [1]. Please let us know if
you encounter any problems [2].

Regards,
The Apache Storm Team

[1] https://downloads.apache.org/storm/apache-storm-2.1.1/RELEASE_NOTES.html
[2] https://issues.apache.org/jira/browse/STORM


[ANNOUNCE] Apache Storm 1.2.4 Released

2021-10-11 Thread Ethan Li
The Apache Storm community is pleased to announce the release of
Apache Storm version 1.2.4.

Apache Storm is a distributed, fault-tolerant, and high-performance
realtime computation system that provides strong guarantees on the
processing of data. You can read more about Apache Storm on the
project website:

https://storm.apache.org/

Downloads of source and binary distributions are listed in our download section:

https://storm.apache.org/downloads.html

You can read more about this release in the following blog post:

https://storm.apache.org/2021/10/11/storm124-released.html

Distribution artifacts are available in Maven Central at the following
coordinates:

groupId: org.apache.storm
artifactId: storm-core
version: 1.2.4

The full list of changes is available here [1]. Please let us know if
you encounter any problems [2].

Regards,
The Apache Storm Team

[1] https://downloads.apache.org/storm/apache-storm-1.2.4/RELEASE_NOTES.html
[2] https://issues.apache.org/jira/browse/STORM


[ANNOUNCE] Apache Storm 2.2.1 Released

2021-10-11 Thread Ethan Li
The Apache Storm community is pleased to announce the release of
Apache Storm version 2.2.1.

Apache Storm is a distributed, fault-tolerant, and high-performance
realtime computation system that provides strong guarantees on the
processing of data. You can read more about Apache Storm on the
project website:

https://storm.apache.org/

Downloads of source and binary distributions are listed in our download section:

https://storm.apache.org/downloads.html

You can read more about this release in the following blog post:

https://storm.apache.org/2021/10/11/storm221-released.html

Distribution artifacts are available in Maven Central at the following
coordinates:

groupId: org.apache.storm
artifactId: storm-{component}
version: 2.2.1

The full list of changes is available here [1]. Please let us know if
you encounter any problems [2].

Regards,
The Apache Storm Team

[1] https://downloads.apache.org/storm/apache-storm-2.2.1/RELEASE_NOTES.html
[2] https://issues.apache.org/jira/browse/STORM


[ANNOUNCE] Apache Storm 2.3.0 Released

2021-09-27 Thread Ethan Li
The Apache Storm community is pleased to announce the release of Apache
Storm version 2.3.0.

Apache Storm is a distributed, fault-tolerant, and high-performance
realtime computation system that provides strong guarantees on the
processing of data. You can read more about Apache Storm on the project
website:

https://storm.apache.org/

Downloads of source and binary distributions are listed in our download
section:

https://storm.apache.org/downloads.html

You can read more about this release in the following blog post:

https://storm.apache.org/2021/09/27/storm230-released.html

Distribution artifacts are available in Maven Central at the following
coordinates:

groupId: org.apache.storm
artifactId: storm-{component}
version: 2.3.0

The full list of changes is available here[1]. Please let us know [2] if
you encounter any problems.

Regards,
The Apache Storm Team

[1] https://downloads.apache.org/storm/apache-storm-2.3.0/RELEASE_NOTES.html
[2] https://issues.apache.org/jira/browse/STORM


Re: Backpressure causing deadlock with recursive topology structure

2020-06-30 Thread Ethan Li
This is an interesting use case. But my understanding of storm is it only 
supports DAG.

But to mitigate this issue, I think there are a few things that can be tried.

1. Enable acking, and anchor the tuple before emitting, then set the 
max.spout.pending 
(https://github.com/apache/storm/blob/master/conf/defaults.yaml#L265 
) so the 
number of tuples in flight can be controlled to a certain range

2. Increase receiveQ size for executors 
(https://github.com/apache/storm/blob/master/conf/defaults.yaml#L317 
) so that 
backpressure can happen less likely

3. Increase parallelism, cpu, memory for your components to let them process 
faster.


Combing above, your topology is less likely to have backpressure.  There should 
be other ways. But above is my current thought at this moment.  Hope it helps.

-Ethan


> On May 13, 2020, at 11:57 AM, Simon Cooper  
> wrote:
> 
> Hi,
>  
> We've encountered a problem with the new backpressure system introduced in 
> Storm2. We've got two mutually recursive bolts in our topology (BoltA sends 
> tuples to BoltB, which sends tuples back to BoltA). This worked fine in 
> Storm1, but causes the topology to randomly deadlock on Storm2.
>  
> When BoltA task starts to bottleneck and sets the backpressure flag, this 
> sends a signal to the worker running the corresponding BoltB task. This stops 
> that task sending any more tuples until the flag is cleared. This also means 
> the bolt cannot process any new tuples. This causes the input queue of BoltB 
> task to fill up, which then sets the backpressure flag sent to BoltA task. We 
> end up in a situation where neither bolt can send tuples to the other, and 
> this causes the whole topology to grind to a halt. Failing tuples doesn't fix 
> it, as tuples aren't removed from the input or output queues.
>  
> Are there any suggestions for settings which may alleviate this? Ideally, to 
> turn the backpressure system off, or render it mute? If we can't find a 
> workaround, we'll probably be forced to go back to Storm1.
>  
> A possible fix may be to allow the task to consume items from the queue, as 
> long as it doesn't try to send any tuples to a backpressure'd task?
>  
> Many thanks,
> Simon Cooper
> This message, and any files/attachments transmitted together with it, is 
> intended for the use only of the person (or persons) to whom it is addressed. 
> It may contain information which is confidential and/or protected by legal 
> privilege. Accordingly, any dissemination, distribution, copying or use of 
> this message, or any part of it or anything sent together with it, other than 
> by intended recipients, may constitute a breach of civil or criminal law and 
> is hereby prohibited. Unless otherwise stated, any views expressed in this 
> message are those of the person sending it and not the sender's employer. No 
> responsibility, legal or otherwise, of whatever nature, is accepted as to the 
> accuracy of the contents of this message or for the completeness of the 
> message as received. Anyone who is not the intended recipient of this message 
> is advised to make no use of it and is requested to contact Featurespace 
> Limited as soon as possible. Any recipient of this message who has knowledge 
> or suspects that it may have been the subject of unauthorised interception or 
> alteration is also requested to contact Featurespace Limited.



Re: Kerberos in Storm

2020-06-30 Thread Ethan Li
Hi Srikant,

I am not able to tell without looking at more logs. Can you provide your 
storm.yaml and nimbus logs?

This doc 
https://github.com/apache/storm/blob/master/SECURITY.md#authentication-kerberos 

 might be helpful (I assume you are already following it, but just in case).




> On Jun 13, 2020, at 9:56 AM, shrikant kalani  wrote:
> 
> Hello Everyone
> 
> Hope someone will be able to help me. I am trying to enable Kerberos for 
> Storm 2.0.0 version but my nimbus instance is continuously crashing.
> 
> Below are the last entry in logs:
> 
> 2020-06-13 10:36:51 [main] LocalFsBlobStore [DEBUG] Deleting keys not on the 
> zookeeper []
> 2020-06-13 10:36:51 [main] LocalFsBlobStore [DEBUG] Creating list of key 
> entries for blobstore inside zookeeper [] local []
> 2020-06-13 10:36:51 [main-EventThread] NimbusInfo [INFO] Nimbus figures out 
> its name to testserver.xyz.com 
> 2020-06-13 10:36:51 [main
> (END)
> 
> In the Zookeeper logs it is mentioned that client has closed the session and 
> it starts deleting all empheral nodes.
> 
> Please can someone help me to understand this behaviour. There are no other 
> errors in the logs.
> 
> Thanks
> Srikant Kalani



Re: Query: Storm SSL Support

2020-06-30 Thread Ethan Li
No problem. Thanks Harish. 

> On Jun 30, 2020, at 5:12 PM, Kadirompalli Venkatashivareddy, Harish Kumar 
>  wrote:
> 
> Thanks a lot Ethan. Your suggestions are very much appreciated.
> We will evaluate the suggestions you provided based on the project timelines 
> we have. I truly hope we can contribute to the community as well
>  
> Regards,
> -Harish
>  
> From: Ethan Li 
> Reply-To: "user@storm.apache.org" 
> Date: Tuesday, June 30, 2020 at 3:09 PM
> To: "user@storm.apache.org" 
> Cc: "dev-h...@storm.apache.org" , "Dastoor, 
> Phiroze" , "Sapsford, Joe" 
> Subject: Re: Query: Storm SSL Support
>  
> Hi Harish,
>  
> As far as I know,  storm doesn’t have encryption between daemons (nimbus<—> 
> supervisor, supervisor<—> supervisor) at this point. Yes we should be able to 
> use SSL enabled thrift. But it is hard (at least for me) to say how much work 
> is needed without looking into it. Contribution on this is very much welcome.
>  
>  
> By the way, for inter-worker communication, you can use 
> BlowfishTupleSerializer: 
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/security/serialization/BlowfishTupleSerializer.java#L32
>  
> <https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/security/serialization/BlowfishTupleSerializer.java#L32>
>  
> But it has a great impact on performance. So you need to evaluate throughly 
> before using it. 
>  
> Thanks
>  
> -Ethan
> 
> 
>> On Jun 19, 2020, at 12:53 PM, Kadirompalli Venkatashivareddy, Harish Kumar 
>> > <mailto:harish.kumar.kadirompalli.venkatashivare...@sap.com>> wrote:
>>  
>> Hello Team,
>> We have been using Apache Storm for building data pipelines 
>> in our company.
>> We have some sensitive data and we would like to know if 
>> storm provides TLS support in the communication channels with in storm 
>> cluster (Nimbus -> Supervisor, Supervisor -> Supervisor).
>> I went over the Apache Storm documentation and 
>> foundhttp://storm.apache.org/releases/1.2.3/SECURITY.html 
>> <http://storm.apache.org/releases/1.2.3/SECURITY.html>.
>>   Documentation suggests to use IPSec for any data encryption. 
>> It doesn’t provide how to configure SSL at socket layer communications.
>>  
>> Only option what we see as of now is to change the storm code to 
>> use SSL enabled thrift classes and also use SSL enabled jetty. If anybody 
>> from d...@storm.apache.org <mailto:d...@storm.apache.org> can answer how 
>> complicated changing storm code can be for this. It will be very helpful ☺
>> We understand these changes add on to major maintenance cycles 
>> on our side. So before doing any change, we would like to check if there is 
>> any way we can add TLS support for our storm cluster through some 
>> configuration or any other means.
>>  
>> Harish Kumar K V
>> Senior Software Engineer, Search
>> M: +1 (408) 313 5574
>> 



Re: [Storm 2.1.0] Adjusting topology assigned memory

2020-06-30 Thread Ethan Li
Hi Pavithra,

The assignedMemory in topology summary is the total memory assigned to your 
topologies. So it changes when you change your number of executors. You can 
visit the topology page and it will show you how much memory is assigned for 
each component. 

I suggest to not change Xmx in worker.childopts since worker.childopts is 
actually a template that will be override before the supervisor launches the 
worker based on the executors scheduled in this worker.

https://github.com/apache/storm/blob/master/conf/defaults.yaml#L194 

https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/supervisor/BasicContainer.java#L427-L444
 



By default, the memory requested by each component is 128MB. 
https://github.com/apache/storm/blob/master/conf/defaults.yaml#L333 


That should explain the UI is showing 61696mb (=128mb * 482).  

Since you changed worker.childopts to be 16384mb, every processes are running 
with 16384mb. But UI doesn’t know that. UI uses the scheduling result to 
determine how much memory is “assigned” to the topology (and to the workers), 
so it shows 61696mb for the topology on UI. 


-Ethan


> On Jun 19, 2020, at 12:57 AM, Pavithra Gunasekara 
>  wrote:
> 
> I'm still not able to figure out why assignedMemOnHeap for topology is 
> showing as 61696mb in storm UI and how this value is being calculated or 
> where it is being read. We have 482 executors/tasks, and I noticed, whenever 
> the number of executors/tasks are changed the assigned memory is getting 
> changed as well.
> 
> Really appreciate it if anyone can explain this to me.
> 
> /api/v1/topology/summary
> {"schedulerDisplayResource":false,"topologies":[{"owner":"root","requestedCpu":4820.0,"topologyVersion":null,"replicationCount":3,"stormVersion":"2.1.0","executorsTotal":482,"assignedMemOnHeap":61696.0,"assignedTotalMem":61696.0,"assignedCpu":4820.0,"requestedMemOnHeap":61696.0,"encodedId":"retail_network-1-1592465357","uptimeSeconds":80726,"uptime":"22h
>  25m 
> 26s","schedulerInfo":null,"requestedTotalMem":61696.0,"assignedMemOffHeap":0.0,"workersTotal":1,"requestedMemOffHeap":0.0,"name":"retail_network","id":"retail_network-1-1592465357","tasksTotal":482,"status":"ACTIVE"},{"owner":"root","requestedCpu":4820.0,"topologyVersion":null,"replicationCount":3,"stormVersion":"2.1.0","executorsTotal":482,"assignedMemOnHeap":61696.0,"assignedTotalMem":61696.0,"assignedCpu":4820.0,"requestedMemOnHeap":61696.0,"encodedId":"test_topo-2-1592465447","uptimeSeconds":80635,"uptime":"22h
>  23m 
> 55s","schedulerInfo":null,"requestedTotalMem":61696.0,"assignedMemOffHeap":0.0,"workersTotal":1,"requestedMemOffHeap":0.0,"name":"test-topo","id":"test_topo-2-1592465447","tasksTotal":482,"status":"ACTIVE"}]}
> 
> Thanks
> -Pavithra
> 
> On Thu, Jun 18, 2020 at 5:12 PM Pavithra Gunasekara 
> mailto:thilini.gunasek...@gmail.com>> wrote:
> Hi all,
> 
> We are in the process of migrating from Storm 1.1.3 to 2.1.0. In storm.yaml 
> we use worker.childopts to adjust the worker assigned memory. However in 
> Storm UI, it shows a different value for Assigned Mem(MB). 
> 
> We have following configs in storm.yaml among the others.
> 
> nimbus.childopts: "-Xmx2048m"
> supervisor.childopts: "-Xmx2048m"
> worker.childopts: "-Xmx16384m"
> 
> Could someone please help us find answers for the following questions?
> 
> 1.  Are we missing any memory related configs in storm.yaml?
> 2. Why is the UI showing a different value(61696 mb) than the one we have 
> used in the configuration file(16384 mb), from where does this value come 
> from?
> 
> Thanks & Regards,
> -Pavithra
> 
> 
> -- 
> -Pavithra



Re: Query: Storm SSL Support

2020-06-30 Thread Ethan Li
Hi Harish,

As far as I know,  storm doesn’t have encryption between daemons (nimbus<—> 
supervisor, supervisor<—> supervisor) at this point. Yes we should be able to 
use SSL enabled thrift. But it is hard (at least for me) to say how much work 
is needed without looking into it. Contribution on this is very much welcome.


By the way, for inter-worker communication, you can use 
BlowfishTupleSerializer: 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/security/serialization/BlowfishTupleSerializer.java#L32
 


But it has a great impact on performance. So you need to evaluate throughly 
before using it. 

Thanks

-Ethan

> On Jun 19, 2020, at 12:53 PM, Kadirompalli Venkatashivareddy, Harish Kumar 
>  wrote:
> 
> Hello Team,
> We have been using Apache Storm for building data pipelines 
> in our company.
> We have some sensitive data and we would like to know if 
> storm provides TLS support in the communication channels with in storm 
> cluster (Nimbus -> Supervisor, Supervisor -> Supervisor).
> I went over the Apache Storm documentation and 
> foundhttp://storm.apache.org/releases/1.2.3/SECURITY.html 
> .
>   Documentation suggests to use IPSec for any data encryption. It 
> doesn’t provide how to configure SSL at socket layer communications.
>  
> Only option what we see as of now is to change the storm code to 
> use SSL enabled thrift classes and also use SSL enabled jetty. If anybody 
> from d...@storm.apache.org  can answer how 
> complicated changing storm code can be for this. It will be very helpful ☺
> We understand these changes add on to major maintenance cycles on 
> our side. So before doing any change, we would like to check if there is any 
> way we can add TLS support for our storm cluster through some configuration 
> or any other means.
>  
> Harish Kumar K V
> Senior Software Engineer, Search
> M: +1 (408) 313 5574
> 



Re: [ANNOUNCE] Apache Storm 2.2.0 Released

2020-06-30 Thread Ethan Li
Congratulations! Thanks Govind for the efforts on getting 2.2.0 released. And 
thanks to everyone for the contributions!

- Ethan 

> On Jun 30, 2020, at 11:36 AM, Govind Menon  wrote:
> 
> The Apache Storm community is pleased to announce the release of Apache
> Storm version 2.2.0.
> 
> Apache Storm is a distributed, fault-tolerant, and high-performance
> realtime computation system that provides strong guarantees on the
> processing of data. You can read more about Storm on the project website:
> 
> http://storm.apache.org 
> 
> Downloads of source and binary distributions are listed in our download
> section:
> 
> http://storm.apache.org/downloads.html 
> 
> 
> You can read more about this release in the following blog post:
> 
> https://storm.apache.org/2020/06/30/storm220-released.html 
> 
> 
> Distribution artifacts are available in Maven Central at the following
> coordinates:
> 
> groupId: org.apache.storm
> artifactId: storm-{component}
> version: 2.2.0
> 
> The full list of changes is available here[1]. Please let us know [2] if
> you encounter any problems.
> 
> Regards,
> 
> The Apache Storm Team
> 
> [1]:
> https://downloads.apache.org/storm/apache-storm-2.2.0/RELEASE_NOTES.html 
> 
> [2]: https://issues.apache.org/jira/browse/STORM 
> 
> 



Re: Storm 2.1 doesn't work on multiple workers

2020-04-08 Thread Ethan Li
Hi Fan,

To unsubscribe, please send an email to user-unsubscr...@storm.apache.org.
Please refer to http://storm.apache.org/getting-help.html

On Wed, Apr 8, 2020 at 11:22 PM Fan Jiang  wrote:

> unsubscribe
>
> On Thu, Apr 9, 2020, 12:21 AM Ethan Li  wrote:
>
>> Hi Simon,
>>
>> I just replied in the JIRA. I am interested and would like to look into
>> it. We have been running storm 2.x without any serious problem. Can you
>> provide a simple example to reproduce the issue? It will be very helpful
>> for me to debug this. Thanks
>>
>> Best
>> Ethan
>>
>> On Wed, Apr 1, 2020 at 5:22 AM Simon Cooper <
>> simon.coo...@featurespace.co.uk> wrote:
>>
>>> Hi,
>>>
>>> We've just upgraded to storm 2.1 from storm 1.2, and we can't run
>>> topologies deployed across more than one worker. The workers constantly
>>> restart with kryo and netty-related errors, very similar to STORM-3582 (we
>>> have also got custom serializers defined). We also can't roll back to storm
>>> 2.0 due to other issues.
>>>
>>> This is a very serious problem for us, and seems like a core piece of
>>> functionality is fundamentally broken. Is there anyone currently looking at
>>> this?
>>>
>>> Simon Cooper
>>> This message, and any files/attachments transmitted together with it, is
>>> intended for the use only of the person (or persons) to whom it is
>>> addressed. It may contain information which is confidential and/or
>>> protected by legal privilege. Accordingly, any dissemination, distribution,
>>> copying or use of this message, or any part of it or anything sent together
>>> with it, other than by intended recipients, may constitute a breach of
>>> civil or criminal law and is hereby prohibited. Unless otherwise stated,
>>> any views expressed in this message are those of the person sending it and
>>> not the sender's employer. No responsibility, legal or otherwise, of
>>> whatever nature, is accepted as to the accuracy of the contents of this
>>> message or for the completeness of the message as received. Anyone who is
>>> not the intended recipient of this message is advised to make no use of it
>>> and is requested to contact Featurespace Limited as soon as possible. Any
>>> recipient of this message who has knowledge or suspects that it may have
>>> been the subject of unauthorised interception or alteration is also
>>> requested to contact Featurespace Limited.
>>>
>>


Re: Storm 2.1 doesn't work on multiple workers

2020-04-08 Thread Ethan Li
Hi Simon,

I just replied in the JIRA. I am interested and would like to look into it.
We have been running storm 2.x without any serious problem. Can you provide
a simple example to reproduce the issue? It will be very helpful for me to
debug this. Thanks

Best
Ethan

On Wed, Apr 1, 2020 at 5:22 AM Simon Cooper 
wrote:

> Hi,
>
> We've just upgraded to storm 2.1 from storm 1.2, and we can't run
> topologies deployed across more than one worker. The workers constantly
> restart with kryo and netty-related errors, very similar to STORM-3582 (we
> have also got custom serializers defined). We also can't roll back to storm
> 2.0 due to other issues.
>
> This is a very serious problem for us, and seems like a core piece of
> functionality is fundamentally broken. Is there anyone currently looking at
> this?
>
> Simon Cooper
> This message, and any files/attachments transmitted together with it, is
> intended for the use only of the person (or persons) to whom it is
> addressed. It may contain information which is confidential and/or
> protected by legal privilege. Accordingly, any dissemination, distribution,
> copying or use of this message, or any part of it or anything sent together
> with it, other than by intended recipients, may constitute a breach of
> civil or criminal law and is hereby prohibited. Unless otherwise stated,
> any views expressed in this message are those of the person sending it and
> not the sender's employer. No responsibility, legal or otherwise, of
> whatever nature, is accepted as to the accuracy of the contents of this
> message or for the completeness of the message as received. Anyone who is
> not the intended recipient of this message is advised to make no use of it
> and is requested to contact Featurespace Limited as soon as possible. Any
> recipient of this message who has knowledge or suspects that it may have
> been the subject of unauthorised interception or alteration is also
> requested to contact Featurespace Limited.
>


Re: Is anyone available for some in-depth storm 2.1 explaining?

2020-04-08 Thread Ethan Li
Hi Peter,

I am not very familiar with trident kafka spout. I found something
https://github.com/apache/storm/blob/master/docs/storm-kafka-client.md#manual-partition-assignment-advanced
related.
See if it helps.

Best
Ethan

On Tue, Apr 7, 2020 at 2:07 AM Peter Neubauer 
wrote:

> Hi there,
> we are trying to upgrade our Storm 1.2.3 installation (with Trident
> Kafka spouts) to 2.1. In our tests with 2 topic partitions, we cannot
> really see a pattern on how exactly the different partitions are
> polled (we get messages randomly left in one or both partitions
> without the spout moving, sometimes it moves after a couple of
> minutes, sometimes after we send more messages, sometimes after we
> remote-debug and insert a breakpoint into the KafkaConsumer). The new
> record-size fetch ConsumerConfig.MAX_POLL_RECORDS_CONFIG is very
> welcome btw!
>
> Is there anyone that can help us (even on a consulting base) to go to
> the bottom with the spout fetch behavior so we are not in the blind
> when updating?
>
> Just let me know, or maybe point to relevant documentation in case we
> missed that - thanks a lot!
>
> /peter
>
> G:  neubauer.peter
> S:  peter.neubauer
> P:  +46 704 106975
> L:   http://www.linkedin.com/in/neubauer
> T:   @peterneubauer
>
> Mapillary - Join the greatest expedition of our time.
>


Re: MTLS authentication for Apache Storm

2020-04-08 Thread Ethan Li
Hi Shrikant,

I am not aware of any plan on this currently. Proposal and code
contributions are welcome. Thanks

On Sat, Mar 28, 2020 at 10:25 AM shrikant kalani 
wrote:

> Hi
>
> Are there any plans that storm nimbus and supervisor will support MTLS in
> future releases ?
>
> Thanks
> Srikant Kalani
>
> Sent from my iPhone


Re: Behavior of heartbeats in 2.1

2020-04-08 Thread Ethan Li
Hi Andrew,

I just replied to your question in another email thread. I hope it helps.
In short,  in 2.x, you don't need PaceMaker.

On Thu, Mar 19, 2020 at 7:16 PM Andrew Neilson  wrote:

> We're working on moving from v0.9.5 to 2.1 right now and as you can
> imagine there have been quite a few changes :).
>
> One of the improvements we've been looking forward to is the different
> approach to heartbeats since we had observed the bottleneck in ZK's
> transaction log.
>
> I've seen references to this change in this list:
> https://issues.apache.org/jira/browse/STORM-2693
>
> Could anyone describe the default behavior of heartbeats in 2.1 vs 1.2? I
> understand that in 1.2 you can use Pacemaker to remove the dependency on ZK
> for heartbeats.
>
> But I haven't seen any up-to-date information on how a cluster should be
> set up today. If we are trying to avoid the ZK bottleneck do we still need
> to use Pacemaker? Or do we get that for free now?
>
> Thanks,
> Andrew
>


Re: Storm 2.0 worker heartbeat

2020-04-08 Thread Ethan Li
In summary, on 2.x, you shouldn't need to worry about PaceMaker. And I
don't suggest to support running older version of topologies running on 2.x
storm cluster. It is painful in practice.

On Wed, Apr 8, 2020 at 10:44 PM Ethan Li  wrote:

> Hi Andrew,
>
> There are two places Pacemaker can be used.
>
> 1. Storm 2.x actually supports running older version of topologies (i.e.
> topology compiled with storm 0.x or 1.x). We have been doing it inside my
> company for a long time. It's kind of painful to support it but it works.
> When older version of topologies running on 2.x Storm cluster, it runs
> older worker code and it can use PaceMaker to mitigate the performance
> issue on zookeeper. Hence the code here still check the heartbeat from
> Zookeeper if any.
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2219-L
>
>
> 2. In 2.x, workers write heartbeats to disk and supervisor pick them up
> and send to nimbus directly. It has another timer (run every 60s by
> default) to send executor status through PaceMaker/Zookeeper.
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L217-L220
>And these are executor metrics (emitted, executed, etc.) used on UI.
> This timer only happens every 60s. So it shouldn't overload zookeeper. But
> Pacemaker can still be used here.  The graph at
> https://github.com/apache/storm/pull/2389 might help to understand this.
>
> The code is merged at https://github.com/apache/storm/pull/2433
>
>
> On Wed, Mar 18, 2020 at 6:38 PM Andrew Neilson 
> wrote:
>
>> Hi Ethan,
>>
>> Pacemaker is not required but still can be used.
>>>
>>
>> Under what circumstances would Pacemaker be used on v2.0+? It's not
>> totally clear to me from how that ticket is written but it looks like that
>> replaced all of the heartbeat logic that was managed by Pacemaker and ZK in
>> older versions.
>>
>> Thanks,
>> Andrew
>>
>> On Thu, Mar 5, 2020 at 7:08 PM Ethan Li 
>> wrote:
>>
>>> Hi Surajeet,
>>>
>>> In 2.0, with the change in STORM-2693
>>> <https://issues.apache.org/jira/browse/STORM-2693>, supervisors will
>>> send worker heartbeat to nimbus directly. Pacemaker is not required but
>>> still can be used.
>>>
>>>
>>> Links:
>>> [1] https://issues.apache.org/jira/browse/STORM-2693
>>>
>>> Best,
>>> Ethan
>>>
>>> On Feb 18, 2020, at 1:03 PM, Surajeet Dev 
>>> wrote:
>>>
>>> I am aware that Pacemaker was introduced to avoid Zookeeper becoming a
>>> bottleneck when there is high volume of worker heartbeat.
>>>
>>> Will Pacemaker be still required if we upgrade to Storm 2.0? Or is it
>>> with 2.0 , the whole thing has been re-architected?
>>>
>>> Regards
>>> Surajeet
>>>
>>>
>>>


Re: Storm 2.0 worker heartbeat

2020-04-08 Thread Ethan Li
Hi Andrew,

There are two places Pacemaker can be used.

1. Storm 2.x actually supports running older version of topologies (i.e.
topology compiled with storm 0.x or 1.x). We have been doing it inside my
company for a long time. It's kind of painful to support it but it works.
When older version of topologies running on 2.x Storm cluster, it runs
older worker code and it can use PaceMaker to mitigate the performance
issue on zookeeper. Hence the code here still check the heartbeat from
Zookeeper if any.
https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2219-L


2. In 2.x, workers write heartbeats to disk and supervisor pick them up and
send to nimbus directly. It has another timer (run every 60s by default) to
send executor status through PaceMaker/Zookeeper.
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L217-L220
   And these are executor metrics (emitted, executed, etc.) used on UI.
This timer only happens every 60s. So it shouldn't overload zookeeper. But
Pacemaker can still be used here.  The graph at
https://github.com/apache/storm/pull/2389 might help to understand this.

The code is merged at https://github.com/apache/storm/pull/2433


On Wed, Mar 18, 2020 at 6:38 PM Andrew Neilson  wrote:

> Hi Ethan,
>
> Pacemaker is not required but still can be used.
>>
>
> Under what circumstances would Pacemaker be used on v2.0+? It's not
> totally clear to me from how that ticket is written but it looks like that
> replaced all of the heartbeat logic that was managed by Pacemaker and ZK in
> older versions.
>
> Thanks,
> Andrew
>
> On Thu, Mar 5, 2020 at 7:08 PM Ethan Li  wrote:
>
>> Hi Surajeet,
>>
>> In 2.0, with the change in STORM-2693
>> <https://issues.apache.org/jira/browse/STORM-2693>, supervisors will
>> send worker heartbeat to nimbus directly. Pacemaker is not required but
>> still can be used.
>>
>>
>> Links:
>> [1] https://issues.apache.org/jira/browse/STORM-2693
>>
>> Best,
>> Ethan
>>
>> On Feb 18, 2020, at 1:03 PM, Surajeet Dev 
>> wrote:
>>
>> I am aware that Pacemaker was introduced to avoid Zookeeper becoming a
>> bottleneck when there is high volume of worker heartbeat.
>>
>> Will Pacemaker be still required if we upgrade to Storm 2.0? Or is it
>> with 2.0 , the whole thing has been re-architected?
>>
>> Regards
>> Surajeet
>>
>>
>>


Re: Storm Authentication using Kerberos

2020-04-08 Thread Ethan Li
I have never done it before but I guess it should work, without security or
using SASL with Digest-MD5. Storm components are merely Zookeeper Clients.

On Sat, Mar 28, 2020 at 10:09 AM shrikant kalani 
wrote:

> I have a storm cluster where I want to implement Kerberos Authentication
> mechanism. The Zookeeper cluster it connects to also server many other
> applications.
>
> Is it possible I can implement Kerberos only for storm components and
> zookepeer is not Kerberos.
>
> Thanks
> Srikant Kalani
>
> Sent from my iPhone


Re: Custom metrics not showing up in storm U - Storm version 1.1.0

2020-03-05 Thread Ethan Li
Sent too fast.


First of all, user defined metrics will not show up on Storm UI or REST API. 

You can send your metrics to somewhere else. You need to register metric 
consumers. Please check 
https://github.com/apache/storm/blob/v1.1.0/docs/Metrics.md#metrics-consumer 
<https://github.com/apache/storm/blob/v1.1.0/docs/Metrics.md#metrics-consumer>

Where you can get the metrics depending on how your metric consumer looks like. 
For example, with org.apache.storm.metric.LoggingMetricsConsumer, metrics will 
be written to the log file.

Best,
Ethan


> On Mar 5, 2020, at 10:36 PM, Ethan Li  wrote:
> 
> Hi Bala,
> 
> 
> You need to register metric consumers. Please check 
> https://github.com/apache/storm/blob/v1.1.0/docs/Metrics.md#metrics-consumer 
> <https://github.com/apache/storm/blob/v1.1.0/docs/Metrics.md#metrics-consumer>
> 
> Best
> Ethan
> 
> 
>> On Feb 13, 2020, at 5:48 PM, Bala > <mailto:kbkre...@yahoo.com>> wrote:
>> 
>> I am using the storm API to define and register custom metrics for the bolt. 
>> But I am not getting any stats about that metrics either in the Storm UI or 
>> the REST API? Do I need to do anything to make the custom metrics show up in 
>> the UI/REST API?
>> 
>> Thanks
>> Bala
> 



Re: Custom metrics not showing up in storm U - Storm version 1.1.0

2020-03-05 Thread Ethan Li
Hi Bala,


You need to register metric consumers. Please check 
https://github.com/apache/storm/blob/v1.1.0/docs/Metrics.md#metrics-consumer 


Best
Ethan


> On Feb 13, 2020, at 5:48 PM, Bala  wrote:
> 
> I am using the storm API to define and register custom metrics for the bolt. 
> But I am not getting any stats about that metrics either in the Storm UI or 
> the REST API? Do I need to do anything to make the custom metrics show up in 
> the UI/REST API?
> 
> Thanks
> Bala



Re: TTransportException during shutdown of topology

2020-03-05 Thread Ethan Li
Hi

It’s hard to know without more logs. But it could be other workers which 
connected to this worker shut down abruptly and the connection was closed 
unexpectedly, according to the code at 
https://github.com/apache/thrift/blob/0.12.0/lib/java/src/org/apache/thrift/transport/TIOStreamTransport.java#L132
 

 

Best,
Ethan

> On Feb 18, 2020, at 2:07 PM, Nithin Uppalapati (BLOOMBERG/ 731 LEX) 
>  wrote:
> 
> Hi,
> 
> Getting the following errors during shutdown of the topology, can I know what 
> could be the reason?
> 
> 2020-02-15 17:23:06,190 ERROR TThreadPoolServer [pool-14-thread-35] Thrift 
> error occurred during processing of message.
> org.apache.storm.thrift.transport.TTransportException: null
> at 
> org.apache.storm.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.transport.TTransport.readAll(TTransport.java:86) 
> ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.transport.TTransport.readAll(TTransport.java:86) 
> ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:27) 
> ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.security.auth.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:145)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> org.apache.storm.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
>  ~[storm-core-1.2.3.jar:1.2.3]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_172]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_172]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
> 



Re: Issues with logging in Storm workers/topology

2020-03-05 Thread Ethan Li
Hi,

I don’t claim to know what’s going on here. But just want to give you some 
input from my little experiment. 


By simply replacing 
https://github.com/apache/storm/blob/v2.1.0/log4j2/worker.xml#L21 
 with your 
pattern: 
{"@timestamp":"%d{-MM-ddTHH:mm:ss.SSS}", "logger": "%logger", 
"message":"%msg","thread_name":"%t","level":"%level"}%n

I am able to get the log you are looking for, e.g. (worker.log)

{"@timestamp":"2020-03-06 03:48:57,987", "logger": 
"org.apache.storm.daemon.Task", "message":"Emitting Tuple: taskId=23 
componentId=spout stream=default values=[snow white and the seven 
dwarfs]","thread_name":"Thread-33-spout-executor[23, 23]","level":"INFO”}

I didn’t change anything else. Maybe it has something to do with the packages 
you configured: packages=“ch.qos.logback.core" or some other setup. 

Best
Ethan

> On Feb 11, 2020, at 7:35 AM, Panovski, Filip  
> wrote:
> 
> Hello,
>  
> I’m having difficulties with getting useful logging information out of my 
> workers/topology on Storm 2.1.0. We’re running our Storm workers as Docker 
> services in Swarm mode and forward STDOUT via a GELF Driver to our ELK stack.
>  
> To keep it as simple as possible, my current worker log4j2 configurations 
> look as follows (basically – log everything to STDOUT @ INFO, log some 
> topology-specific logs to STDOUT @ DEBUG):
>  
> cluster.xml
>  
>  packages="ch.qos.logback.core">
> 
> + %msg +%n 
> 
> 
> 
> 
> ${logstash}
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> worker.xml
>  packages="ch.qos.logback.core">
> 
>  name="logstash">{"@timestamp":"%d{-MM-ddTHH:mm:ss.SSS}", "logger": 
> "%logger", "message":"%msg","thread_name":"%t","level":"%level"}%n
> 
> 
> 
> 
> ${logstash}
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> According to my this configuration, I would've expected a log message 
> formatted similarly to the following (newlines added for clarity):
>  
> {
>   "@timestamp": "2020-02-11 11:32:40,748",
>   "logger": "org.com.package.aggregation.SomeAggregation",
>   "message": "joinStateStream: values: [-4e30-49a6-8e1c-f7817633bb34, 
> 7c77-a622-4ae4-a504-2490db47cafe, 2020-02-11]",
>   "thread_name": "Thread-23-b-25-joinState-groupedQ-revQ-executor[22, 22]",
>   "level": "DEBUG"
> }
>  
>  
> The resulting message that is delivered, however, looks completely different 
> and ‘wrong’:
>  
> + Worker Process 273c05df-f087-43ca-a59a-e281bae98ab1:  
> {  
> "@timestamp":"2020-02-11 11:32:40,748",
> "logger": "STDERR",
> "message":
> "{
> "@timestamp":"2020-02-11 11:32:40,748",
> "logger": "org.com.package.aggregation.SomeAggregation",
> "message":"joinStateStream: values: 
> [-4e30-49a6-8e1c-f7817633bb34, 7c77-a622-4ae4-a504-2490db47cafe, 
> 2020-02-11]",
> 
> "thread_name":"Thread-23-b-25-joinState-groupedQ-revQ-executor[22, 22]",
> "level":"DEBUG"
> }",  
> "thread_name":"Thread-2",
> "level":"INFO"} +
>  
> Basically:
> -  It seems that any worker logs are actually logged by the supervisor 
> (“Worker Process ” prefix as well as the enclosing “+” from the 
> cluste.xml above)
> -  The worker logs themselves seem to be wrapped for some reason. The 
> “message” part of the outer message seems to contain yet another log message.
>  
> And so:
> 1.I took a look at the source for 
> “org.apache.storm.daemon.supervisor.BasicContainer#launch” and it doesn’t 
> seem like this log prefix (“Worker process :”) is configurable, which is 
> quite surprising. Is there a different way of deployment where this is 
> configurable?
> 2.What could be causing this “message” wrapping going on? As far as I can 
> see, the “wrappee” is my actual topology log message that I wish to parse, 
> and the “wrapper” is something else entirely (seemingly an error log? With 
> level info?  ¯\_(ツ)_/¯
>  
> Is there any way to simply get the worker/topology logs in a specified 
> format? As a last resort, I could potentially parse the worker.log files, but 
> I would rather not resort to that as it is not as robust as I’d like (worker 
> could potentially crash between writing to and reading from the log file).
>  
>  
> 
> --
> Amtsgericht M�nster HRB 10502 Vertretungsberechtigter Vorstand / Board of 
> Management: Thomas St�mmler (Vorsitz / Chairman), Jens Reckendorf, Silvia 
> Ostermann Aufsichtsrat / Supervisory Board: Christian Ehlers (Vorsitz / 
> Chairman)



Re: localorshuffle grouping doubt

2020-03-05 Thread Ethan Li
They will not.

Please refer to the code here 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java#L85-L88
 


The targetTasks will be the tasks on the same worker, unless there is none, 
then localOrShuffle will be just like normal shuffle grouping 


> On Feb 19, 2020, at 3:35 AM, Tarun Chabarwal  
> wrote:
> 
> Consider following configuration for a topology:
> #spouts: 2
> #bolt: 8
> #workers: 3
> 
> [attached pic]
> 
> With this configuration one of the workers won't have spout and If we apply 
> localorshuffle grouping between spout and bolt, does worker3 bolts receive 
> any tuple ?
> 
> Regards
> Tarun Chabarwal
> 



Re: Storm 2.0 worker heartbeat

2020-03-05 Thread Ethan Li
Hi Surajeet,

In 2.0, with the change in STORM-2693 
, supervisors will send 
worker heartbeat to nimbus directly. Pacemaker is not required but still can be 
used.


Links: 
[1] https://issues.apache.org/jira/browse/STORM-2693 


Best,
Ethan

> On Feb 18, 2020, at 1:03 PM, Surajeet Dev  wrote:
> 
> I am aware that Pacemaker was introduced to avoid Zookeeper becoming a 
> bottleneck when there is high volume of worker heartbeat.
> 
> Will Pacemaker be still required if we upgrade to Storm 2.0? Or is it with 
> 2.0 , the whole thing has been re-architected?
> 
> Regards
> Surajeet



Re: Storm 1.2.1 - Excessive workerbeats causing long GC and thus disconneting zookeeper

2020-03-05 Thread Ethan Li
So you are seeing 65MB on UI. UI only shows assigned memory, not memory usage. 

As I mentioned earlier, -Xmx%HEAP-MEM%m in worker.childopts is designed to be 
replaced with the total memory assigned for the worker. ( 
https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L392
 
<https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L392>).
 Your config "worker.childopts: "-Xmx2048m -XX:+PrintGCDetails” will make every 
worker use 2GB. But it will not show up on UI as 2GB because UI doesn’t read 
“-Xmx2048m” from worker.childopts.

The assigned memory on UI is the sum of the assigned memory of all the 
executors in the worker. The amount of  assigned memory for a worker depends on 
how it’s scheduled and what the executors in the worker are. 

For example,  by default, every instance/executor is configured with 128MB 
memory (https://github.com/apache/storm/blob/1.x-branch/conf/defaults.yaml#L276 
<https://github.com/apache/storm/blob/1.x-branch/conf/defaults.yaml#L276>). If 
4 executors are scheduled in one worker, then the assigned memory for that 
worker is 512MB. 


Hope that helps. 


> On Feb 17, 2020, at 8:55 AM, Narasimhan Chengalvarayan 
>  wrote:
> 
> Hi Ethan Li,
> 
> 
> Sorry for the late reply. Please find the output where it is showing
> -Xmx2048m for worker heap . But in  storm ui we are seeing the worker
> allocated memory as 65MB for each worker.
> 
> java -server -Dlogging.sensitivity=S3 -Dlogfile.name=worker.log
> -Dstorm.home=/opt/storm/apache-storm-1.2.1
> -Dworkers.artifacts=/var/log/storm/workers-artifacts
> -Dstorm.id=Topology_334348-43-1580365369
> -Dworker.id=f1e3e060-0b32-4ecd-8c34-c486258264a4 -Dworker.port=6707
> -Dstorm.log.dir=/var/log/storm
> -Dlog4j.configurationFile=/opt/storm/apache-storm-1.2.1/log4j2/worker.xml
> -DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector
> -Dstorm.local.dir=/var/log/storm/tmp -Xmx2048m -XX:+PrintGCDetails
> -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=artifacts/heapdump
> -Djava.library.path=/var/log/storm/tmp/supervisor/stormdist/Topology_334348-43-1580365369/resources/Linux-amd64:/var/log/storm/tmp/supervisor/stormdist/Topology_334348-43-1580365369/resources:/usr/local/lib:/opt/local/lib:/usr/lib
> -Dstorm.conf.file= -Dstorm.options=
> -Djava.io.tmpdir=/var/log/storm/tmp/workers/f1e3e060-0b32-4ecd-8c34-c486258264a4/tmp
> -cp 
> /opt/storm/apache-storm-1.2.1/lib/*:/opt/storm/apache-storm-1.2.1/extlib/*:/opt/storm/apache-storm-1.2.1/conf:/var/log/storm/tmp/supervisor/stormdist/Topology_334348-43-1580365369/stormjar.jar
> org.apache.storm.daemon.worker Topology_334348-43-1580365369
> 7fe05c2b-ebcf-491b-a8cc-2565834b5988 6707
> f1e3e060-0b32-4ecd-8c34-c486258264a4
> 
> On Tue, 4 Feb 2020 at 04:36, Ethan Li  wrote:
>> 
>> This is where the worker launch command is composed:
>> 
>> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L653-L671
>> 
>> Since your worker.childopts is set, and topology.worker.childopts is empty,
>> 
>> 
>> worker.childopts: "-Xmx2048m -XX:+PrintGCDetails
>> -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>> -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=artifacts/heapdump”
>> 
>> 
>> The command to launch the worker process should have -Xmx2048m.
>> 
>> I don’t see why it would be 65MB. And what do you mean by coming as 65MB 
>> only? Is it only committed 65MB? Or is the max only 65MB?
>> 
>> Could you submit the topology and show the result of “ps -aux |grep 
>> --ignore-case worker”? This will show you the JVM parameters of the worker 
>> process.
>> 
>> 
>> (BTW, -Xmx%HEAP-MEM%m in worker.childopts is designed to be replaced 
>> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L392)
>> 
>> 
>> 
>> 
>> On Jan 30, 2020, at 2:12 AM, Narasimhan Chengalvarayan 
>>  wrote:
>> 
>> Hi Ethan,
>> 
>> 
>> Please find the configuration detail
>> 
>> **
>> 
>> #Licensed to the Apache Software Foundation (ASF) under one
>> # or more contributor license agreements.  See the NOTICE file
>> # distributed with this work for additional information
>

Re: Storm 1.2.1 - Excessive workerbeats causing long GC and thus disconneting zookeeper

2020-02-03 Thread Ethan Li
This is where the worker launch command is composed:

https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L653-L671
 
<https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L653-L671>

Since your worker.childopts is set, and topology.worker.childopts is empty,


> worker.childopts: "-Xmx2048m -XX:+PrintGCDetails
> -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=artifacts/heapdump”

The command to launch the worker process should have -Xmx2048m.  

I don’t see why it would be 65MB. And what do you mean by coming as 65MB only? 
Is it only committed 65MB? Or is the max only 65MB?

Could you submit the topology and show the result of “ps -aux |grep 
--ignore-case worker”? This will show you the JVM parameters of the worker 
process.


(BTW, -Xmx%HEAP-MEM%m in worker.childopts is designed to be replaced 
https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L392
 
<https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L392>)




> On Jan 30, 2020, at 2:12 AM, Narasimhan Chengalvarayan 
>  wrote:
> 
> Hi Ethan,
> 
> 
> Please find the configuration detail
> 
> **
> 
> #Licensed to the Apache Software Foundation (ASF) under one
> # or more contributor license agreements.  See the NOTICE file
> # distributed with this work for additional information
> # regarding copyright ownership.  The ASF licenses this file
> # to you under the Apache License, Version 2.0 (the
> # "License"); you may not use this file except in compliance
> # with the License.  You may obtain a copy of the License at
> #
> # http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> 
> ### These MUST be filled in for a storm configuration
> storm.zookeeper.servers:
> - "ZK1"
> - "ZK2"
> - "ZK3"
> 
> nimbus.seeds: ["host1","host2"]
> ui.port : 8081
> storm.log.dir: "/var/log/storm"
> storm.local.dir: "/var/log/storm/tmp"
> supervisor.slots.ports:
> - 6700
> - 6701
> - 6702
> - 6703
> - 6704
> - 6705
> - 6706
> - 6707
> - 6708
> - 6709
> - 6710
> - 6711
> - 6712
> - 6713
> - 6714
> - 6715
> - 6716
> - 6717
> worker.heap.memory.mb: 1639
> topology.worker.max.heap.size.mb: 1639
> worker.childopts: "-Xmx2048m -XX:+PrintGCDetails
> -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=artifacts/heapdump"
> worker.gc.childopts: ""
> 
> topology.min.replication.count: 2
> #
> #
> # # These may optionally be filled in:
> #
> ## List of custom serializations
> # topology.kryo.register:
> # - org.mycompany.MyType
> # - org.mycompany.MyType2: org.mycompany.MyType2Serializer
> #
> ## List of custom kryo decorators
> # topology.kryo.decorators:
> # - org.mycompany.MyDecorator
> #
> ## Locations of the drpc servers
> # drpc.servers:
> # - "server1"
> # - "server2"
> 
> ## Metrics Consumers
> # topology.metrics.consumer.register:
> #   - class: "org.apache.storm.metric.LoggingMetricsConsumer"
> # parallelism.hint: 1
> #   - class: "org.mycompany.MyMetricsConsumer"
> # parallelism.hint: 1
> # argument:
> #   - endpoint: "metrics-collector.mycompany.org"
> 
> *
> 
> 
> On Thu, 30 Jan 2020 at 03:07, Ethan Li  wrote:
>> 
>> I am not sure. Can you provide your configs?
>> 
>> 
>> 
>>> On Jan 28, 2020, at 6:33 PM, Narasimhan Chengalvarayan 
>>>  wrote:
>>> 
>>> Hi Team,
>>> 
>>> Do you have any idea, In storm apache 1.1.0 we have set worker size as
>>> 2 GB , Once we upgrade to 1.2.1 .It was coming as 65MB only. please
>>> help

Re: Bolts Acking

2020-01-29 Thread Ethan Li
Hi,

If you don’t anchor at all, collector.ack will basically do nothing (because 
input.getMessageId().getAnchorsToIds()  is empty).

It will accumulate the metrics (acked count) though. If you don’t acking at 
all, you can set “topology.acker.executors” to zero.

Code is here:

Ack:  
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java#L121
 

Emit: 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java#L82
 


> On Jan 27, 2020, at 4:45 PM, Nithin Uppalapati (BLOOMBERG/ 731 LEX) 
>  wrote:
> 
> Hi,
> 
> All of my bolts in the topology implement BaseRichBolt and following is the 
> signature of prepare method, and I do not pass the input tuple as an argument 
> in the collector.emit of any of the bolts execute implementation. So, 
> essentially am not anchoring the tuples. 
> 
> Since I am not anchoring the tuples and using BaseRichBolt, what happens with 
> collector.ack ? Documentation says BaseRichBolt does not implicitly anchor or 
> ack the tuples. 
> 
> What is the behaviour of collector.ack, also can you point me to the 
> implementation of collector.ack?
> 
> @Override
> public final void prepare(@SuppressWarnings("rawtypes") Map stormConf,
> TopologyContext context, OutputCollector collector) {



Re: Storm 1.2.1 - Excessive workerbeats causing long GC and thus disconneting zookeeper

2020-01-29 Thread Ethan Li
I am not sure. Can you provide your configs? 



> On Jan 28, 2020, at 6:33 PM, Narasimhan Chengalvarayan 
>  wrote:
> 
> Hi Team,
> 
> Do you have any idea, In storm apache 1.1.0 we have set worker size as
> 2 GB , Once we upgrade to 1.2.1 .It was coming as 65MB only. please
> help us .DO we need to follow different configuration setting for
> storm 1.2.1 or it is a bug.
> 
> On Mon, 27 Jan 2020 at 16:44, Narasimhan Chengalvarayan
>  wrote:
>> 
>> Hi Team,
>> 
>> In storm 1.2.1 version, worker memory is showing as 65MB. But we have
>> set worker memory has 2GB.
>> 
>> On Fri, 24 Jan 2020 at 01:25, Ethan Li  wrote:
>>> 
>>> 
>>> 1) What is stored in Workerbeats znode?
>>> 
>>> 
>>> Worker periodically sends heartbeat to zookeeper under workerbeats node.
>>> 
>>> 2) Which settings control the frequency of workerbeats update
>>> 
>>> 
>>> 
>>> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L1534-L1539
>>> task.heartbeat.frequency.secs Default to 3
>>> 
>>> 3)What will be the impact if the frequency is reduced
>>> 
>>> 
>>> Nimbus get the worker status from workerbeat znode to know if executors on 
>>> workers are alive or not.
>>> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L595-L601
>>> If heartbeat exceeds nimbus.task.timeout.secs (default to 30), nimbus will 
>>> think the certain executor is dead and try to reschedule.
>>> 
>>> To reduce the issue on zookeeper, a pacemaker component was introduced. 
>>> https://github.com/apache/storm/blob/master/docs/Pacemaker.md
>>> You might want to use it too.
>>> 
>>> Thanks
>>> 
>>> 
>>> On Dec 10, 2019, at 4:36 PM, Surajeet Dev  wrote:
>>> 
>>> We upgraded Storm version to 1.2.1 , and since then have been consistently 
>>> observing Zookeeper session timeouts .
>>> 
>>> On analysis , we observed that there is high frequency of updates on 
>>> workerbeats znode with data upto size of 50KB. This causes the Garbage 
>>> Collector to kickoff lasting more than 15 secs , resulting in Zookeper 
>>> session timeout
>>> 
>>> I understand , increasing the session timeout will alleviate the issue , 
>>> but we have already done that twice
>>> 
>>> My questions are:
>>> 
>>> 1) What is stored in Workerbeats znode?
>>> 2) Which settings control the frequency of workerbeats update
>>> 3)What will be the impact if the frequency is reduced
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Thanks
>> C.Narasimhan
>> 09739123245
> 
> 
> 
> -- 
> Thanks
> C.Narasimhan
> 09739123245



Re: Storm 1.2.1 - Excessive workerbeats causing long GC and thus disconneting zookeeper

2020-01-23 Thread Ethan Li

> 1) What is stored in Workerbeats znode?

Worker periodically sends heartbeat to zookeeper under workerbeats node.

> 2) Which settings control the frequency of workerbeats update


https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L1534-L1539
 

task.heartbeat.frequency.secs  Default to 3

> 3)What will be the impact if the frequency is reduced

Nimbus get the worker status from workerbeat znode to know if executors on 
workers are alive or not. 
https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L595-L601
 

If heartbeat exceeds nimbus.task.timeout.secs (default to 30), nimbus will 
think the certain executor is dead and try to reschedule.

To reduce the issue on zookeeper, a pacemaker component was introduced. 
https://github.com/apache/storm/blob/master/docs/Pacemaker.md 
 
You might want to use it too. 

Thanks


> On Dec 10, 2019, at 4:36 PM, Surajeet Dev  wrote:
> 
> We upgraded Storm version to 1.2.1 , and since then have been consistently 
> observing Zookeeper session timeouts . 
> 
> On analysis , we observed that there is high frequency of updates on 
> workerbeats znode with data upto size of 50KB. This causes the Garbage 
> Collector to kickoff lasting more than 15 secs , resulting in Zookeper 
> session timeout
> 
> I understand , increasing the session timeout will alleviate the issue , but 
> we have already done that twice 
> 
> My questions are:
> 
> 1) What is stored in Workerbeats znode?
> 2) Which settings control the frequency of workerbeats update
> 3)What will be the impact if the frequency is reduced
> 
> 



Re: Worker never starts

2020-01-23 Thread Ethan Li
> 1. Where exactly can we configure the worker nodes? because I am not able to 
> identify any worker node related configurations in storm.yaml file attached.

See if this link 
https://storm.apache.org/releases/2.1.0/Setting-up-a-Storm-cluster.html 
<https://storm.apache.org/releases/2.1.0/Setting-up-a-Storm-cluster.html> can 
help you with it. You need to set up a zookeeper cluster, configure supervisors 
properly.
Then supervisors will register themselves to zookeeper, to be available for 
nimbus to schedule tasks on.


> 2. In my current cluster when I run any topology I am not able to see log 
> files getting created, 

You can start ui and logviewer daemons to understand the status of your storm 
cluster and topologies better if you haven’t done so. Make sure topology is 
“running"
 

> 3. How to identify the reason for not getting this log file creation ??

Is the topology running? You can get some idea of topology status on UI. Then 
you can check daemon logs, like supervisor, nimbus logs. Also check  
/workers-artifacts directory to see if any workers are 
scheduled. 


> On Jan 16, 2020, at 7:31 AM, Srikanth bathi  wrote:
> 
> Hi Ethan,
> 
> Thanks for quick reply,
> 
> I am not able to understand the logs properly, can you please help me to 
> understand log file structure and config file for multi node cluster.
> 
> 1. Where exactly can we configure the worker nodes? because I am not able to 
> identify any worker node related configurations in storm.yaml file attached.
> 2. In my current cluster when I run any topology I am not able to see log 
> files getting created, 
> 3. How to identify the reason for not getting this log file creation ??
> 
> Please let me help how to configure to add some extra worker nodes.
> 
> Thanks,
> Srikanth
> 
> On Fri, Jan 10, 2020 at 10:18 PM Ethan Li  <mailto:ethanopensou...@gmail.com>> wrote:
> Hi Srikanth,
> 
> It’s hard to tell without enough logs/informations. Could you attach your 
> configs, supervisor logs to start with?
> 
> Best,
> Ethan
> 
> 
>> On Jan 10, 2020, at 10:28 AM, Srikanth bathi > <mailto:srikanth581...@gmail.com>> wrote:
>> 
>> 
>> 
>> -- Forwarded message -
>> From: Srikanth bathi > <mailto:srikanth581...@gmail.com>>
>> Date: Fri, Jan 10, 2020 at 9:52 PM
>> Subject: Worker never starts
>> To: mailto:user@storm.apache.org>>
>> 
>> 
>> Hi Team, 
>> 
>> Can you please help me on the issue workers never getting started.
>> 
>> Supervisor is trying to create a worker which is never happened.
>> 
>> Please help me how to debug this, because i do not see any logs apart from 
>> Current supervisor time: 1578670653. State: :not-started, Heartbeat: nil
>> 
>> 
>> Supervisor is trying to assign worker in one of the ports configured 
>> 6700
>> - 6701
>> - 6702
>> - 6703
>> - 6704
>> - 6705
>> - 6706
>> - 6707
>> - 6708
>> - 6709
>> - 6710
>> - 6711
>> - 6712
>> - 6713
>> - 6714
>> - 6715
>> - 6716
>> - 6717
>> - 6718
>> - 6719
>> - 6720
>> which is failing. 
>> 
>> -- 
>> srikanth
>>  
>> 
>> 
>> -- 
>> srikanth
>>  
>> 
> 
> 
> 
> -- 
> srikanth
>  
> 



Re: Change Storm binding address in local mode

2020-01-10 Thread Ethan Li
Hi Andrea,

I don’t see a way to change binding address for zookeeper if you are using 
InProcessZookeeper or DevZookeeper.

https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/zookeeper/Zookeeper.java#L78
 


https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/testing/InProcessZookeeper.java#L34
 


https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/command/DevZookeeper.java#L31
 

https://github.com/apache/storm/blob/3ec02e0b1a4c9c35f67da803ac9e3c9d89ab7eb7/docs/Command-line-client.md#dev-zookeeper
 



However, you should be able to set up a zookeeper quorum and change 
storm.zookeeper.servers and storm.zookeeper.port
 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1092-L1096
 

 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1102
 


Best,
Ethan


> On Dec 8, 2019, at 5:26 AM, Andrea Cardaci  wrote:
> 
> On Sun, 1 Dec 2019 at 20:11, Andrea Cardaci  wrote:
>> Storm (or some of its services) listens on the TCP port 2000 during
>> the local execution. I'm using a LocalCluster and I'm submitting the
>> topology programmatically.
> 
> So does anyone know what listens on port 2000 in local mode and how to
> change it's binding address to localhost?



Re: Worker never starts

2020-01-10 Thread Ethan Li
Hi Srikanth,

It’s hard to tell without enough logs/informations. Could you attach your 
configs, supervisor logs to start with?

Best,
Ethan


> On Jan 10, 2020, at 10:28 AM, Srikanth bathi  wrote:
> 
> 
> 
> -- Forwarded message -
> From: Srikanth bathi  >
> Date: Fri, Jan 10, 2020 at 9:52 PM
> Subject: Worker never starts
> To: mailto:user@storm.apache.org>>
> 
> 
> Hi Team, 
> 
> Can you please help me on the issue workers never getting started.
> 
> Supervisor is trying to create a worker which is never happened.
> 
> Please help me how to debug this, because i do not see any logs apart from 
> Current supervisor time: 1578670653. State: :not-started, Heartbeat: nil
> 
> 
> Supervisor is trying to assign worker in one of the ports configured 
> 6700
> - 6701
> - 6702
> - 6703
> - 6704
> - 6705
> - 6706
> - 6707
> - 6708
> - 6709
> - 6710
> - 6711
> - 6712
> - 6713
> - 6714
> - 6715
> - 6716
> - 6717
> - 6718
> - 6719
> - 6720
> which is failing. 
> 
> -- 
> srikanth
>  
> 
> 
> -- 
> srikanth
>  
> 



Re: Upgrading storm 1.1 to 2.0 Command Line Arguments Issue

2019-10-31 Thread Ethan Li
Hi Ray,

There is a known bug in 2.0.0 and it’s fixed in 2.1.0. Please try 2.1.0 and see 
if it works for you. 

Thanks,
Ethan

> On Oct 16, 2019, at 4:10 PM, Zac  wrote:
> 
> Hi all,
> 
> While upgrading storm versions, I found that I could not submit command line 
> arguments to my topology because they were being modified before reaching my 
> topology. My commands was like this:
> 
> path-to-storm-bin jar jar-to-run topology-class --foo my_foo --bar my_bar
> 
> In 2.0, I get an error saying unrecognized argument foo my_foo (the -- is 
> missing from foo). The only solution I have been able to use has been to 
> surround all the arguments with quotes so it looks as follows:
> 
> path-to-storm-bin jar jar-to-run topology-class "--foo my_foo --bar my_bar"
> 
> And then to grab all the arguments as a single argument and split them up 
> myself. This doesn't seem like this should be the way to do it. Is there 
> another way you are supposed to use command line args?
> 
> Thanks
> -Ray
> 



[ANNOUNCE] Apache Storm 2.1.0 Released

2019-10-31 Thread Ethan Li
The Apache Storm community is pleased to announce the release of Apache
Storm version 2.1.0.

Apache Storm is a distributed, fault-tolerant, and high-performance
realtime computation system that provides strong guarantees on the
processing of data. You can read more about Storm on the project website:

http://storm.apache.org

Downloads of source and binary distributions are listed in our download
section:

http://storm.apache.org/downloads.html

You can read more about this release in the following blog post:

https://storm.apache.org/2019/10/31/storm210-released.html

Distribution artifacts are available in Maven Central at the following
coordinates:

groupId: org.apache.storm
artifactId: storm-{component}
version: 2.1.0

The full list of changes is available here[1]. Please let us know [2] if
you encounter any problems.

Regards,

The Apache Storm Team

[1]:
http://www.us.apache.org/dist/storm/apache-storm-2.1.0/RELEASE_NOTES.html
[2]: https://issues.apache.org/jira/browse/STORM


Re: Storm 2.1

2019-10-23 Thread Ethan Li
Hi Simon,

There are a few issues related to CLI that should be fixed shortly. We will 
then have a new RC for vote. 



> On Oct 23, 2019, at 4:29 AM, Simon Cooper  
> wrote:
> 
> Hi,
>  
> Do we have an ETA for Storm 2.1 yet? I know there were a couple of RCs a 
> while ago, but everything seems to have gone quiet? There are some bug fixes 
> in there for issues stopping us from upgrading to Storm 2.0, so we're still 
> stuck on 1.2.x for the moment…
>  
> Thanks,
> Simon Cooper
> This message, and any files/attachments transmitted together with it, is 
> intended for the use only of the person (or persons) to whom it is addressed. 
> It may contain information which is confidential and/or protected by legal 
> privilege. Accordingly, any dissemination, distribution, copying or use of 
> this message, or any part of it or anything sent together with it, other than 
> by intended recipients, may constitute a breach of civil or criminal law and 
> is hereby prohibited. Unless otherwise stated, any views expressed in this 
> message are those of the person sending it and not the sender's employer. No 
> responsibility, legal or otherwise, of whatever nature, is accepted as to the 
> accuracy of the contents of this message or for the completeness of the 
> message as received. Anyone who is not the intended recipient of this message 
> is advised to make no use of it and is requested to contact Featurespace 
> Limited as soon as possible. Any recipient of this message who has knowledge 
> or suspects that it may have been the subject of unauthorised interception or 
> alteration is also requested to contact Featurespace Limited.



Re: Storm Diagnostics Tool

2019-10-09 Thread Ethan Li
Hi Sreeram,

It’s great to see such tools built for helping to solve performance problem for 
topologies. Thanks for sharing!

Best,
Ethan


> On Oct 4, 2019, at 7:34 AM, Sreeram  wrote:
> 
> Dear Storm Community,
> 
> Greetings!
> 
> We (GSTN India team) use Apache Storm for our workloads. To assist in regular 
> ops use cases like diagnosing slow workers, thread dump analysis and 
> identifying bolts for re-balance and such, we have developed a web based tool 
> - https://github.com/GSTNIndia/Storm-Diagnostics 
>  .
> 
> The tool is quite straight forward to setup and use (refer to 'Quick Start' 
> section in the repository's README page).
> 
> Would love to know the community feedback !
> 
> Best,
> Sreeram
> 



Re: Decline in apache storm usage

2019-08-29 Thread Ethan Li
Sorry I don’t have knowledge about that. 

> On Aug 29, 2019, at 11:32 AM, Gunjan Dave  wrote:
> 
> Thanks ethan for getting back.
> 
> Would you be aware if there are any plans of cloud providers such as google, 
> aws or azure providing  apache storm cloud services similar to how GCP has 
> done it for apache spark/gcp-proc and apache-beam/gcp-dataflow? 
> 
> 
> On Thu, Aug 29, 2019, 20:54 Ethan Li  <mailto:ethanopensou...@gmail.com>> wrote:
> Hi Gunjan,
> 
> I can’t speak for general usages. But the company I work for is using Apache 
> Storm very heavily. 
> 
> 
>> On Aug 28, 2019, at 9:44 AM, Gunjan Dave > <mailto:gunjanpiyushd...@gmail.com>> wrote:
>> 
>> 
>> Hello devs,
>> Seeing that apache storms usage is declining. Do you attribute this to 
>> anything?
>> Even though storm do umentation shows quite number of companies using storm, 
>> seems like that page is outdated. 
>> 
>> Is storm community taking any steps to increase the usage and adoption again?
>> 
>> Also seems like a lot of cloud providers are also not giving storm based 
>> service as opposed to other streaming solutions. Is storm community doing 
>> anything to have more cloud adoption for storm 
> 



Re: Decline in apache storm usage

2019-08-29 Thread Ethan Li
Hi Gunjan,

I can’t speak for general usages. But the company I work for is using Apache 
Storm very heavily. 


> On Aug 28, 2019, at 9:44 AM, Gunjan Dave  wrote:
> 
> 
> Hello devs,
> Seeing that apache storms usage is declining. Do you attribute this to 
> anything?
> Even though storm do umentation shows quite number of companies using storm, 
> seems like that page is outdated. 
> 
> Is storm community taking any steps to increase the usage and adoption again?
> 
> Also seems like a lot of cloud providers are also not giving storm based 
> service as opposed to other streaming solutions. Is storm community doing 
> anything to have more cloud adoption for storm 



Re: pacemaker question

2019-08-23 Thread Ethan Li
Yes. Zookeeper is required. 
https://storm.apache.org/releases/2.0.0/Setting-up-a-Storm-cluster.html 
<https://storm.apache.org/releases/2.0.0/Setting-up-a-Storm-cluster.html>

Even if you run Apache Storm in localMode, 
https://storm.apache.org/releases/2.0.0/Local-mode.html 
<https://storm.apache.org/releases/2.0.0/Local-mode.html>, an in process 
Zookeeper will be launched. 
(https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/LocalCluster.java#L144
 
<https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/LocalCluster.java#L144>)

 

> On Aug 23, 2019, at 8:32 AM, Stig Rohde Døssing  
> wrote:
> 
> As far as I know you can't use Storm without Zookeeper. Pacemaker is just for 
> receiving heartbeats, Storm still needs to store other information in 
> Zookeeper, e.g. which topologies are deployed.
> 
> Den fre. 23. aug. 2019 kl. 10.57 skrev Igor A.  <mailto:igor.f...@gmail.com>>:
> Ethan, 
> 
> >Pacemaker is an additional daemon that takes some load from Zookeeper so it 
> >will not be overloaded, which means Zookeeper is still required. 
> 
> could you please confirm that Zookeeper is required in any Storm 
> configuration?
> 
> On Thu, Aug 22, 2019, 19:06 Ethan Li  <mailto:ethanopensou...@gmail.com>> wrote:
> Hi Ignor,
> 
> If you are running 2.0.0 or higher version of topology on 2.x Storm cluster, 
> thanks to the changes of  https://issues.apache.org/jira/browse/STORM-2693 
> <https://issues.apache.org/jira/browse/STORM-2693>, 
> https://github.com/apache/storm/pull/2433 
> <https://github.com/apache/storm/pull/2433>, Pacemaker is no longer needed. 
> It won’t do anything. 
> 
> 
> If you are running older version of topology on 2.x Storm cluster,  Pacemaker 
> is an additional daemon that takes some load from Zookeeper so it will not be 
> overloaded, which means Zookeeper is still required.  
> https://github.com/apache/storm/blob/master/docs/Pacemaker.md 
> <https://github.com/apache/storm/blob/master/docs/Pacemaker.md>. So you will 
> see supervisors connecting to zookeeper. 
> 
> storm.zookeeper.port is used to connect to zookeeper so you want to set it as 
> your zookeeper port, default to 2181. 
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1067
>  
> <https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1067>
> 
> 
> Thanks
> Ethan
> 
> 
>> On Aug 22, 2019, at 10:24 AM, Igor A. > <mailto:igor.f...@gmail.com>> wrote:
>> 
>> Hello all,
>> 
>> I'm trying to run Storm version 2.0.0 with Pacemaker.
>> I've set up everything like it is described in documentation (there is not 
>> so much parameters to miss something). Pacemaker is listening on default 
>> :6699.
>> The issue is despite the fact storm.cluster.state.store is set to 
>> "[...].PaceMakerStateStorageFactory", Storm Supervisor tries to connect to 
>> Zookeeper on :2181.
>> 
>> I can see in logs that Storm launched Zookeeper client and it runs in 3.4 
>> compatibility mode.
>> I tried to set storm.zookeeper.port to 6699, but this caused Pacemaker to 
>> die with an NPE. Note: after reading sources i've found that PacemakerClient 
>> doesn't use zookeeper.port
>> Does anyone know how to make Storm communicate with Pacemaker? May be 
>> somebody have a working storm.yaml to enable Pacemake Client?
>> 
>> 
>> 
> 



Re: pacemaker question

2019-08-22 Thread Ethan Li
Sorry Igor for not typing your name correctly.

> On Aug 22, 2019, at 11:06 AM, Ethan Li  wrote:
> 
> Hi Ignor,
> 
> If you are running 2.0.0 or higher version of topology on 2.x Storm cluster, 
> thanks to the changes of  https://issues.apache.org/jira/browse/STORM-2693 
> <https://issues.apache.org/jira/browse/STORM-2693>, 
> https://github.com/apache/storm/pull/2433 
> <https://github.com/apache/storm/pull/2433>, Pacemaker is no longer needed. 
> It won’t do anything. 
> 
> 
> If you are running older version of topology on 2.x Storm cluster,  Pacemaker 
> is an additional daemon that takes some load from Zookeeper so it will not be 
> overloaded, which means Zookeeper is still required.  
> https://github.com/apache/storm/blob/master/docs/Pacemaker.md 
> <https://github.com/apache/storm/blob/master/docs/Pacemaker.md>. So you will 
> see supervisors connecting to zookeeper. 
> 
> storm.zookeeper.port is used to connect to zookeeper so you want to set it as 
> your zookeeper port, default to 2181. 
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1067
>  
> <https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1067>
> 
> 
> Thanks
> Ethan
> 
> 
>> On Aug 22, 2019, at 10:24 AM, Igor A. > <mailto:igor.f...@gmail.com>> wrote:
>> 
>> Hello all,
>> 
>> I'm trying to run Storm version 2.0.0 with Pacemaker.
>> I've set up everything like it is described in documentation (there is not 
>> so much parameters to miss something). Pacemaker is listening on default 
>> :6699.
>> The issue is despite the fact storm.cluster.state.store is set to 
>> "[...].PaceMakerStateStorageFactory", Storm Supervisor tries to connect to 
>> Zookeeper on :2181.
>> 
>> I can see in logs that Storm launched Zookeeper client and it runs in 3.4 
>> compatibility mode.
>> I tried to set storm.zookeeper.port to 6699, but this caused Pacemaker to 
>> die with an NPE. Note: after reading sources i've found that PacemakerClient 
>> doesn't use zookeeper.port
>> Does anyone know how to make Storm communicate with Pacemaker? May be 
>> somebody have a working storm.yaml to enable Pacemake Client?
>> 
>> 
>> 
> 



Re: pacemaker question

2019-08-22 Thread Ethan Li
Hi Ignor,

If you are running 2.0.0 or higher version of topology on 2.x Storm cluster, 
thanks to the changes of  https://issues.apache.org/jira/browse/STORM-2693 
, 
https://github.com/apache/storm/pull/2433 
, Pacemaker is no longer needed. It 
won’t do anything. 


If you are running older version of topology on 2.x Storm cluster,  Pacemaker 
is an additional daemon that takes some load from Zookeeper so it will not be 
overloaded, which means Zookeeper is still required.  
https://github.com/apache/storm/blob/master/docs/Pacemaker.md 
. So you will 
see supervisors connecting to zookeeper. 

storm.zookeeper.port is used to connect to zookeeper so you want to set it as 
your zookeeper port, default to 2181. 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1067
 



Thanks
Ethan


> On Aug 22, 2019, at 10:24 AM, Igor A.  wrote:
> 
> Hello all,
> 
> I'm trying to run Storm version 2.0.0 with Pacemaker.
> I've set up everything like it is described in documentation (there is not so 
> much parameters to miss something). Pacemaker is listening on default :6699.
> The issue is despite the fact storm.cluster.state.store is set to 
> "[...].PaceMakerStateStorageFactory", Storm Supervisor tries to connect to 
> Zookeeper on :2181.
> 
> I can see in logs that Storm launched Zookeeper client and it runs in 3.4 
> compatibility mode.
> I tried to set storm.zookeeper.port to 6699, but this caused Pacemaker to die 
> with an NPE. Note: after reading sources i've found that PacemakerClient 
> doesn't use zookeeper.port
> Does anyone know how to make Storm communicate with Pacemaker? May be 
> somebody have a working storm.yaml to enable Pacemake Client?
> 
> 
> 



Re: Storm 2.0.0 build error

2019-08-15 Thread Ethan Li
I just tried and it was able to compile using mvn clean install -DskipTests -P 
'native,include-shaded-deps,!externals,!examples,!coverage’

[INFO] 
[INFO] Reactor Summary for Storm 2.0.0:
[INFO]
[INFO] Storm .. SUCCESS [ 30.905 s]
[INFO] Apache Storm - Checkstyle .. SUCCESS [  2.093 s]
[INFO] multilang-javascript ... SUCCESS [  0.262 s]
[INFO] multilang-python ... SUCCESS [  0.100 s]
[INFO] multilang-ruby . SUCCESS [  0.122 s]
[INFO] maven-shade-clojure-transformer  SUCCESS [  6.181 s]
[INFO] storm-maven-plugins  SUCCESS [  5.707 s]
[INFO] Shaded Deps for Storm Client ... SUCCESS [  9.584 s]
[INFO] Storm Client ... SUCCESS [ 41.827 s]
[INFO] storm-server ... SUCCESS [ 12.527 s]
[INFO] storm-clojure .. SUCCESS [  6.936 s]
[INFO] Storm Core . SUCCESS [ 14.464 s]
[INFO] Storm Webapp ... SUCCESS [ 10.393 s]
[INFO] storm-clojure-test . SUCCESS [  5.490 s]
[INFO] storm-submit-tools . SUCCESS [  1.451 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  02:32 min
[INFO] Finished at: 2019-08-16T04:44:33Z
[INFO] 


Not sure what happened to your build. Maybe others have some idea about it.

> On Aug 15, 2019, at 8:58 PM, Kang Minwoo  wrote:
> 
> Hello,
> 
> I try to compile storm 2.0.0 from source.
> But I had an error. that is "class file for 
> org.apache.yetus.audience.InterfaceAudience not found"
> 
> I don't know why maven occur compilation error.
> 
> Build command: mvn clean install -DskipTests -P 
> 'native,include-shaded-deps,!externals,!examples,!coverage'
> Java version: 1.8.0_202
> Maven version: 3.6.1
> 
> 
> -
> 
> [INFO] 
> 
> [INFO] Reactor Summary for Storm 2.0.0:
> [INFO]
> [INFO] Storm .. SUCCESS [  2.716 
> s]
> [INFO] Apache Storm - Checkstyle .. SUCCESS [  0.542 
> s]
> [INFO] multilang-javascript ... SUCCESS [  0.082 
> s]
> [INFO] multilang-python ... SUCCESS [  0.087 
> s]
> [INFO] multilang-ruby . SUCCESS [  0.076 
> s]
> [INFO] maven-shade-clojure-transformer  SUCCESS [  2.276 
> s]
> [INFO] storm-maven-plugins  SUCCESS [  2.877 
> s]
> [INFO] Shaded Deps for Storm Client ... SUCCESS [  7.925 
> s]
> [INFO] Storm Client ... SUCCESS [ 25.205 
> s]
> [INFO] storm-server ... SUCCESS [  8.549 
> s]
> [INFO] storm-clojure .. SUCCESS [  4.844 
> s]
> [INFO] Storm Core . FAILURE [  2.730 
> s]
> [INFO] Storm Webapp ... SKIPPED
> [INFO] storm-clojure-test . SKIPPED
> [INFO] storm-submit-tools . SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  58.980 s
> [INFO] Finished at: 2019-08-16T10:46:03+09:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile 
> (default-compile) on project storm-core: Compilation failure
> [ERROR] cannot access org.apache.yetus.audience.InterfaceAudience
> [ERROR]   class file for org.apache.yetus.audience.InterfaceAudience not found
> 
> -
> 
> Best regards,
> Minwoo Kang



Re: Storm Helm Chart example

2019-08-02 Thread Ethan Li
Sorry I meant I am not aware of any official reference. Fat fingered..  


> On Aug 2, 2019, at 1:48 PM, Indranil Roy  
> wrote:
> 
> To clarify, my team has rolled our own charts for Kubernetes deployment 
> (Storm 1.x) , but I was hoping that there would be some official releases so 
> that we can take necessary guidance from it. Further, is there any version of 
> the docker images (https://hub.docker.com/_/storm/?tab=tags 
> <https://hub.docker.com/_/storm/?tab=tags>)  that has support for SLES 15 ? 
> 
> On Fri, Aug 2, 2019 at 7:25 PM Ethan Li  <mailto:ethanopensou...@gmail.com>> wrote:
> Hi Indranil, 
> 
> I am aware of any of Helm Chart based installation reference.
> 
> 
> > On Aug 2, 2019, at 8:23 AM, Indranil Roy  > <mailto:indranil.roychowdh...@gmail.com>> wrote:
> > 
> > Is there any official reference of Helm chart based installation of Storm ?
> 



Re: Storm Helm Chart example

2019-08-02 Thread Ethan Li
Hi Indranil, 

I am aware of any of Helm Chart based installation reference.


> On Aug 2, 2019, at 8:23 AM, Indranil Roy  
> wrote:
> 
> Is there any official reference of Helm chart based installation of Storm ?



Re: Apache Storm version 2.0 problem running local topology in maven

2019-07-03 Thread Ethan Li
I am also on Mac. My java version is  jdk8 but it shouldn’t matter. Can you 
post the whole output so that I can take a look?

> On Jul 3, 2019, at 10:37 AM, damjan gjurovski  wrote:
> 
> Hey,
> Thank you for the fast response. I have definitely went through the whole 
> output several times and I never saw any output.. I will try it under Linux 
> and see if something changes.. 
> 
> Regards,
> Damjan
> 
> On Wed, 3 Jul 2019, 17:29 Ethan Li,  <mailto:ethanopensou...@gmail.com>> wrote:
> Hi Damjan,
> 
> Your topology is actually able to run locally and I can see see the output 
> from your bolt: “The new sum is xxx ”. 
> 
> Although (local)cluster.shutdown didn’t really work and keep printing out 
> errors. I would like to take some time later to look into it. 
> 
> 
> By the way, to upgrade from older version of topology, you just need to 
> switch your dependency from storm-core to storm-client and recompile. 
> 
> Thanks,
> Ethan
> 
>> On Jul 3, 2019, at 8:05 AM, damjan gjurovski > <mailto:gjurovski...@gmail.com>> wrote:
>> 
>> Hello,
>> I am trying to run the new version of Storm but for some reason when 
>> starting it locally all of my output commands are not shown in the console 
>> leading me to the conclusion that the storm program is not executed at all. 
>> The output from Storm is enormous and I cannot see where is the problem. 
>> 
>> This is the content of my pom.xml file:
>> com.example.exclamationTopologyEx
>> ExclamationTopologtEx
>> 1.0-SNAPSHOT
>> 
>> UTF-8
>> 
>> 
>> 
>> 
>> org.apache.storm
>> storm-client
>> 2.0.0
>> provided
>> 
>> 
>> org.apache.storm
>> storm-server
>> 2.0.0
>> 
>> 
>> 
>> 
>> 
>> 
>> org.codehaus.mojo
>> exec-maven-plugin
>> 1.5.0
>> 
>> 
>> 
>> exec
>> 
>> 
>> 
>> 
>> java
>> true
>> false
>> compile
>> com.stormExample.SquareStormTopology
>> false
>> 
>> 
>> 
>> org.apache.maven.plugins
>> maven-compiler-plugin
>> 3.6.1
>> 
>> 1.8
>> 1.8
>> 
>> 
>> 
>> 
>> 
>> This is the main class for starting the topology:
>> public static void main(String[] args) throws Exception {
>> //used to build the toplogy
>> TopologyBuilder builder = new TopologyBuilder();
>> //add the spout with name 'spout' and parallelism hint of 5 executors
>> builder.setSpout("spout", new DataSpout(),5);
>> //add the Emitter bolt with the name 'emitter'
>> builder.setBolt("emitter",new EmitterBolt(),8).shuffleGrouping("spout");
>> 
>> Config conf = new Config();
>> //set to false to disable debug when running on production cluster
>> conf.setDebug(false);
>> //if there are arguments then we are running on a cluster
>> if(args!= null && args.length > 0){
>> //parallelism hint to set the number of workers
>> conf.setNumWorkers(3);
>> //submit the toplogy
>> StormSubmitter.submitTopology(args[0], conf, 
>> builder.createTopology());
>> }else{//we are running locally
>> //the maximum number of executors
>> conf.setMaxTaskParallelism(3);
>> 
>> //local cluster used to run locally
>> LocalCluster cluster = new LocalCluster();
>> //submitting the topology
>> cluster.submitTopology("emitter-topology",conf, 
>> builder.createTopology());
>> //sleep
>> Thread.sleep(1);
>> //shut down the cluster
>> cluster.shutdown();
>> }
>> }
>> 
>> Additionally, these are my spout and bolt implementations:
>> public class DataSpout extends BaseRichSpout {
>> 
>> SpoutOutputCollector _collector;
>> int nextNumber;
>> 
>> @Override
>> public void open(Map map, TopologyContext topologyContext, 
>> SpoutOutputCollector spoutOutputCollector) {
>> _collector = spoutOutput

Re: Apache Storm version 2.0 problem running local topology in maven

2019-07-03 Thread Ethan Li
Hi Damjan,

Your topology is actually able to run locally and I can see see the output from 
your bolt: “The new sum is xxx ”. 

Although (local)cluster.shutdown didn’t really work and keep printing out 
errors. I would like to take some time later to look into it. 


By the way, to upgrade from older version of topology, you just need to switch 
your dependency from storm-core to storm-client and recompile. 

Thanks,
Ethan

> On Jul 3, 2019, at 8:05 AM, damjan gjurovski  wrote:
> 
> Hello,
> I am trying to run the new version of Storm but for some reason when starting 
> it locally all of my output commands are not shown in the console leading me 
> to the conclusion that the storm program is not executed at all. The output 
> from Storm is enormous and I cannot see where is the problem. 
> 
> This is the content of my pom.xml file:
> com.example.exclamationTopologyEx
> ExclamationTopologtEx
> 1.0-SNAPSHOT
> 
> UTF-8
> 
> 
> 
> 
> org.apache.storm
> storm-client
> 2.0.0
> provided
> 
> 
> org.apache.storm
> storm-server
> 2.0.0
> 
> 
> 
> 
> 
> 
> org.codehaus.mojo
> exec-maven-plugin
> 1.5.0
> 
> 
> 
> exec
> 
> 
> 
> 
> java
> true
> false
> compile
> com.stormExample.SquareStormTopology
> false
> 
> 
> 
> org.apache.maven.plugins
> maven-compiler-plugin
> 3.6.1
> 
> 1.8
> 1.8
> 
> 
> 
> 
> 
> This is the main class for starting the topology:
> public static void main(String[] args) throws Exception {
> //used to build the toplogy
> TopologyBuilder builder = new TopologyBuilder();
> //add the spout with name 'spout' and parallelism hint of 5 executors
> builder.setSpout("spout", new DataSpout(),5);
> //add the Emitter bolt with the name 'emitter'
> builder.setBolt("emitter",new EmitterBolt(),8).shuffleGrouping("spout");
> 
> Config conf = new Config();
> //set to false to disable debug when running on production cluster
> conf.setDebug(false);
> //if there are arguments then we are running on a cluster
> if(args!= null && args.length > 0){
> //parallelism hint to set the number of workers
> conf.setNumWorkers(3);
> //submit the toplogy
> StormSubmitter.submitTopology(args[0], conf, 
> builder.createTopology());
> }else{//we are running locally
> //the maximum number of executors
> conf.setMaxTaskParallelism(3);
> 
> //local cluster used to run locally
> LocalCluster cluster = new LocalCluster();
> //submitting the topology
> cluster.submitTopology("emitter-topology",conf, 
> builder.createTopology());
> //sleep
> Thread.sleep(1);
> //shut down the cluster
> cluster.shutdown();
> }
> }
> 
> Additionally, these are my spout and bolt implementations:
> public class DataSpout extends BaseRichSpout {
> 
> SpoutOutputCollector _collector;
> int nextNumber;
> 
> @Override
> public void open(Map map, TopologyContext topologyContext, 
> SpoutOutputCollector spoutOutputCollector) {
> _collector = spoutOutputCollector;
> nextNumber = 2;
> }
> 
> @Override
> public void nextTuple() {
> if(nextNumber > 20){
> return;
> }else{
> _collector.emit(new Values(nextNumber));
> 
> nextNumber = nextNumber + 2;
> }
> }
> 
> @Override
> public void declareOutputFields(OutputFieldsDeclarer 
> outputFieldsDeclarer) {
> outputFieldsDeclarer.declare(new Fields("number"));
> }
> }
> public class EmitterBolt extends BaseBasicBolt {
> 
> int sumNumbers;
> 
> public EmitterBolt(){
> 
> }
> 
> @Override
> public void prepare(Map stormConf, TopologyContext context) {
> sumNumbers = 0;
> System.out.println("com.example.exclamationTopologyEx.EmitterBolt has 
> been initialized");
> }
> 
> @Override
> public void execute(Tuple tuple, BasicOutputCollector 
> basicOutputCollector) {
> int nextNumber = tuple.getIntegerByField("number");
> 
> sumNumbers+=nextNumber;
> 
> System.out.println("The new sum is: " + sumNumbers);
> }
> 
> @Override
> public void declareOutputFields(OutputFieldsDeclarer 
> outputFieldsDeclarer) {
> 
> }
> }
> 
> I use the maven command:
> mvn compile exec:java -Dstorm.topology=com.example.exclamationTopologyEx 
> for running the topology but as I said none of the print lines are shown in 
> the console. I have tried to search for a solution from the documentation 
> 

Re: Start command not propagated by Nimbus

2019-06-04 Thread Ethan Li
Hi Mitchell,


Does the UI show that this topology is fully scheduled? Are you using 
ResourceAwareScheduler? Because it’s possible that the scheduler cannot find 
enough resources to schedule this topology. 

Also after starting the topology, you could login to zookeeper and check if 
there is a assignment belong to this topology. You can also read the content of 
it to get some better idea. But you will need some coding to do it since it’s 
serialized. 

It’s really hard to tell the root cause without more information. Is it 
possible for you to provide all the related nimbus.log, supervisor.log so I can 
take a look?

Best,
Ethan

> On May 17, 2019, at 11:38 AM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) 
>  wrote:
> 
> We have a topology that was never started, even though nimbus received the 
> start command. Supervisor never received a command to start this topology, so 
> the issue wasn't in our topology code. In the logs, I see:
> 
> 2019-05-11 10:07:28,087 INFO  nimbus [pool-14-thread-16] Activating 
> WingmanTopology4246: WingmanTopology4246-251-1557583643
> 
> There were a bunch of topologies started around the same time, and most of 
> them had the following message occur next:
> 
> [timer] Setting new assignment for topology id  Name>:
> 
> However, we did not see this logged for the topology that wasn't started. 
> When the cluster was stopped, we saw:
> 
> 2019-05-11 10:36:04,447 INFO  nimbus [pool-14-thread-4] Delaying event 
> :remove for 5 secs for WingmanTopology4246-251-1557583643
> 2019-05-11 10:36:04,457 INFO  nimbus [pool-14-thread-4] Adding topo to 
> history log: WingmanTopology4246-251-1557583643
> 
> 
> What could have caused this? There were 16 topologies submitted to be run in 
> total, our storm.yaml file allocates more than enough slots under 
> supervisor.slots.ports.



Re: [ANNOUNCE] Apache Storm 2.0.0 Released

2019-05-30 Thread Ethan Li
It’s great news! Thank you very much for the release

Best,
Ethan



> On May 30, 2019, at 3:50 PM, Alexandre Vermeerbergen 
>  wrote:
> 
> Thanks you very much Taylor for releasing this Storm 2.0 major release !
> 
> Kind regards,
> Alexandre Vermeerbergen
> 
> Le jeu. 30 mai 2019 à 21:56, P. Taylor Goetz  a écrit :
>> 
>> The Apache Storm community is pleased to announce the release of Apache 
>> Storm version 2.0.0.
>> 
>> Storm is a distributed, fault-tolerant, and high-performance realtime 
>> computation system that provides strong guarantees on the processing of 
>> data. You can read more about Storm on the project website:
>> 
>> http://storm.apache.org
>> 
>> Downloads of source and binary distributions are listed in our download
>> section:
>> 
>> http://storm.apache.org/downloads.html
>> 
>> You can read more about this release in the following blog post:
>> 
>> http://storm.apache.org/2019/05/30/storm200-released.html
>> 
>> Distribution artifacts are available in Maven Central at the following 
>> coordinates:
>> 
>> groupId: org.apache.storm
>> artifactId: storm-core
>> version: 2.0.0
>> 
>> The full list of changes is available here[1]. Please let us know [2] if you 
>> encounter any problems.
>> 
>> Regards,
>> 
>> The Apache Storm Team
>> 
>> [1]: 
>> http://www.us.apache.org/dist/storm/apache-storm-2.0.0/RELEASE_NOTES.html
>> [2]: https://issues.apache.org/jira/browse/STORM



Re: Doubt about ShuffleGrouping and LocalOrShuffleGrouping

2019-05-16 Thread Ethan Li
Hi Mohit,

Your understanding about ShuffleGrouping is correct. 

For LocalOrShuffleGrouping,  the upstream spout/bolt will only send tuples to 
the downstream bolt on the same worker (not machine) if it can. Otherwise, it 
will be just  like ShuffleGrouping. 

Code: 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java#L86-L88
 



Thanks
Ethan


> On May 14, 2019, at 11:40 AM, Mohit Goyal  wrote:
> 
> Hi,
> 
> Can someone here tell me the difference between shuffleGrouping and 
> localOrShuffleGrouping?
> My use case is: I have supervisor running in 3 different instances and 
> workers/tasks distributed among them.
> My understanding for grouping is: if I use shufflegrouping it will send tuple 
> to any task of bolt be in same machine or different. But if I use 
> localOrShuffleGrouping, it will try to send tuple to such task of next bolt 
> which lies in same machine, and if its overloaded, then it'll send it to task 
> residing in other machine.
> Please let me know the difference. Thanks in advance.
> 
> Regards,
> Mohit Goyal



Re: Storm logger file

2019-04-25 Thread Ethan Li
Hi Abhishek,

You can use “storm set_log_level” command to set log level for your topology, 
for example, "storm set_log_level -l ROOT=DEBUG:30 topology-name”.
You can run "storm help set_log_level” for the usage of it.

Thanks.

-Ethan


> On Apr 5, 2019, at 7:08 AM, Abhishek Raj  wrote:
> 
> Hi,
> 
> On Storm UI, we have an option to enable logger for specific classes using 
> the following section - 
> 
> 
> 
> This is great but every time we redeploy our topology, we have to redefine 
> all our loggers in the UI again which is quite a bit of work if there are a 
> lot of classes. Is there a way to submit all the logger classes to Storm when 
> deploying a topology and have them automatically added to the UI? The loggers 
> would be disabled by default and we can enable them whenever needed.
> 
> Not sure if this feature already exists. If not, could be very useful.
> 
> Let me know what you guys think.
> 
> Thanks.



Re: Question about Evenscheduler in Storm 2.0

2018-11-30 Thread Ethan Li
Hi Junguk,

Please set "storm.scheduler”  
(https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/DaemonConfig.java#L96
 
)
 to change scheduler


- Ethan

> On Nov 30, 2018, at 12:59 PM, Junguk Cho  wrote:
> 
> Hi,
> 
> I am trying to use "Evenscheduler" in Storm 2.0.
> I configured scheduler like this.
> topology.scheduler.strategy: "org.apache.storm.scheduler.EvenScheduler"
> 
> However, when I ran Nimbus, it said
> Exception in thread "main" java.lang.IllegalArgumentException: Field
> VALIDATE_TOPOLOGY_SCHEDULER_STRATEGY with value
> org.apache.storm.scheduler.EvenScheduler does not implement
> org.apache.storm.scheduler.resource.strategies.scheduling.IStrategy
> 
> Would you please let me know how to configure "EvenScheduler" in Storm 2.0?
> 
> Thanks,
> Junguk



Re: Display ackers executors

2018-07-20 Thread Ethan Li
Please try 
http://localhost:8080/api/v1/topology/my-topology-id/component/__acker 
<http://localhost:8080/api/v1/topology/my-topology-id/component/__acker>?sys=true
 
<https://fubariteblue-ni.blue.ygrid.yahoo.com:4443/api/v1/topology/cb_reporting_cow_updates-612-1531172361/component/__acker?sys=true>
 



> On Jul 19, 2018, at 11:50 AM, Alessio Pagliari  wrote:
> 
> Thank you Ethan, I totally missed that button. 
> 
> Is there also a way to retrieve information about ‘__acker’ through REST 
> APIs, like any other component of the topology? Calling 
> http://localhost:8080/api/v1/topology/my-topology-id/component/__acker 
> <http://localhost:8080/api/v1/topology/my-topology-id/component/__acker> 
> doesn’t give me the same output as other components.
> 
> Cheers,
> 
> Alessio
> 
> 
>> On 19 Jul 2018, at 18:20, Ethan Li > <mailto:ethanopensou...@gmail.com>> wrote:
>> 
>> At the bottom of the topology page, you can find “Show System Stats” button. 
>> Click on it and you will see __acker.
>> 
>> Best,
>> Ethan
>> 
>>> On Jul 19, 2018, at 11:04 AM, Alessio Pagliari >> <mailto:pagli...@i3s.unice.fr>> wrote:
>>> 
>>> Hello everybody,
>>> 
>>> I cannot find a way to show in the UI or retrieve through REST APIs the 
>>> placement of the acker executors. Is there a way to get this information? I 
>>> remember that in some topologies in the past I was able to see them in the 
>>> UI with the name ‘__acker’, but now I can’t understand how to display them 
>>> again.
>>> 
>>> Thank you,
>>> 
>>> Alessio
>>> 
>> 
> 



Re: Display ackers executors

2018-07-19 Thread Ethan Li
At the bottom of the topology page, you can find “Show System Stats” button. 
Click on it and you will see __acker.

Best,
Ethan

> On Jul 19, 2018, at 11:04 AM, Alessio Pagliari  wrote:
> 
> Hello everybody,
> 
> I cannot find a way to show in the UI or retrieve through REST APIs the 
> placement of the acker executors. Is there a way to get this information? I 
> remember that in some topologies in the past I was able to see them in the UI 
> with the name ‘__acker’, but now I can’t understand how to display them again.
> 
> Thank you,
> 
> Alessio
> 



Re: worker.yaml and worker.pid in workers.artifacts director

2018-04-18 Thread Ethan Li
As far as I know, the path has to be {STORM_WORKERS_ARTIFACTS_DIR}/{worker 
id}/{worker port}/worker.pid. Maybe someone has different answer to it.



> On Apr 18, 2018, at 4:24 PM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) 
> <mrathb...@bloomberg.net> wrote:
> 
> I was hoping to change where they are sent at a more granular level. For 
> example, say I update the worker.xml file to update the worker.log file 
> paths. Say I set it to 
> {STORM_WORKERS_ARTIFACTS_DIR}//logs. Making this 
> change, worker.log* will be sent to 
> {STORM_WORKERS_ARTIFACTS_DIR}//logs, but 
> worker.yaml and worker.pid will still be sent to 
> {STORM_WORKERS_ARTIFACTS_DIR}/{worker id}/{worker port}. Is there anyway to 
> specify the path of worker.yaml and worker.pid as the same as the other log 
> files?
> 
> Sent from Bloomberg Professional for Android
> 
> 
> - Original Message -
> From: Ethan Li <user@storm.apache.org <mailto:user@storm.apache.org>>
> At: 18-Apr-2018 16:19:31
> 
> Hi Mitchell,
> 
> 
> worker.yaml and worker.pid are used by storm itself.  
> 
> They will be put into {STORM_WORKERS_ARTIFACTS_DIR}/{worker id}/{worker port} 
> (see 
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/ConfigUtils.java#L259-L261
>  
> <https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/ConfigUtils.java#L259-L261>
>  
>
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/ConfigUtils.java#L104-L114
>  
> <https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/ConfigUtils.java#L104-L114>)
>  
> 
> If you want to change their location,  you can set this config: 
> “STORM_WORKERS_ARTIFACTS_DIR” (  
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1751-L1757
>  
> <https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1751-L1757>)
> 
> 
> Ethan
> 
> 
>> On Apr 18, 2018, at 1:12 PM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) 
>> <mrathb...@bloomberg.net <mailto:mrathb...@bloomberg.net>> wrote:
>> 
>> I have noticed that when I run a topology, all the worker related logs are 
>> written by default to "${sys:workers.artifacts}/${sys:storm.id 
>> }/${sys:worker.port}/${sys:logfile.name}", as specified 
>> by the default worker.xml file. I have noticed that if I update the filename 
>> used in the worker.xml file, all of the logs are sent to the specified 
>> directory. However, the original specified path will still be populated with 
>> just worker.pid and worker.yaml files. What are these files used for? Is it 
>> possible to update the log directory specified by worker.xml and then have 
>> the worker.pid and worker.yaml files written to that same directory?
> 



Re: worker.yaml and worker.pid in workers.artifacts directory

2018-04-18 Thread Ethan Li
Hi Mitchell,


worker.yaml and worker.pid are used by storm itself.  

They will be put into {STORM_WORKERS_ARTIFACTS_DIR}/{worker id}/{worker port} 
(see 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/ConfigUtils.java#L259-L261
 

 
   
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/ConfigUtils.java#L104-L114
 
)
 

If you want to change their location,  you can set this config: 
“STORM_WORKERS_ARTIFACTS_DIR” (  
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1751-L1757
 
)


Ethan


> On Apr 18, 2018, at 1:12 PM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) 
>  wrote:
> 
> I have noticed that when I run a topology, all the worker related logs are 
> written by default to 
> "${sys:workers.artifacts}/${sys:storm.id}/${sys:worker.port}/${sys:logfile.name}",
>  as specified by the default worker.xml file. I have noticed that if I update 
> the filename used in the worker.xml file, all of the logs are sent to the 
> specified directory. However, the original specified path will still be 
> populated with just worker.pid and worker.yaml files. What are these files 
> used for? Is it possible to update the log directory specified by worker.xml 
> and then have the worker.pid and worker.yaml files written to that same 
> directory?



Re: Running multiple topologies

2018-04-12 Thread Ethan Li
Hi Daniel,

I am not sure if I understand your questions correctly. But will the resource 
aware scheduler help? 
https://github.com/apache/storm/blob/master/docs/Resource_Aware_Scheduler_overview.md
 



Thanks
Ethan


> On Apr 12, 2018, at 1:45 PM, Hannum, Daniel  
> wrote:
> 
> I think I know the answer, but I can’t find docs, so check my thinking please.
>  
> We’re going to be going from 1 topology to maybe 5 on our (v1.1.x) cluster. I 
> think the way I share the cluster is by setting NumWorkers() on all the 
> topologies so we divide the available JVM’s up. If that is true, then don’t 
> we have the problem that I’m tying up resources if one is idle? Or that I 
> can’t move a JVM from topology A to topology B if B is under load?
>  
> So my questions are: 
> Do I understand this correctly?
> Is there any way to improve this situation besides just getting more hosts 
> and being ok with some portion of them being idle?
> Is this going to get better in Storm 2?
>  
> Thanks!



Re: Storm Kerberos starting topology fails with "The TGT found is not renewable"

2018-01-11 Thread Ethan Li
Hi Prakash,

It might sound silly but did you check if the ticket you think you are using is 
the one that’s actually being used. I fixed the “The TGT found is not 
renewable” problem in my use case before but sorry I couldn’t remember the 
details.

Best,
Ethan

> On Jan 11, 2018, at 3:10 PM, prakash r  wrote:
> 
> Hello All,
> 
> Any suggestion on this ?
> 
> Is there anyway we can avoid this TGT Renewal check or how to resolve.
> 
> Regards,
> Prakash R
> 
> On Tue, Jan 9, 2018 at 3:31 PM, prakash r  > wrote:
> Hello,
> 
> We are facing issue with starting a topology when Storm is kerberosed.
> 
> 1189 [main] INFO o.a.s.s.a.AuthUtils - Got AutoCreds 
> [org.apache.storm.security.auth.kerberos.AutoTGT@129b4fe2]
> 1189 [main] INFO  o.a.s.StormSubmitter - Running 
> org.apache.storm.security.auth.kerberos.AutoTGT@129b4fe2
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: The TGT found is not renewable
>   at 
> org.apache.storm.security.auth.kerberos.AutoTGT.populateCredentials(AutoTGT.java:103)
>   at 
> org.apache.storm.StormSubmitter.populateCredentials(StormSubmitter.java:94)
>   at 
> org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:214)
>   at 
> org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:310)
>   at 
> org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:157)
>   at storm.starter.WordCountTopology.main(WordCountTopology.java:77)
> Caused by: java.lang.RuntimeException: The TGT found is not renewable
>   at 
> org.apache.storm.security.auth.kerberos.AutoTGT.populateCredentials(AutoTGT.java:94)
>  
>  ... 5 more
> 
> When we check the Keberos Principal which as R Flag as well.
> 
> We tried even regenerating the keytabs, this problem is not resolved.
> 
> When we submit from new keytab principal, this is working fine.
> 
> Can you please suggest, is there anyway we can avoid this TGT Renewal check 
> or how to resolve.
> 
> OS version : 
> Red Hat Enterprise Linux Server release 7.4 (Maipo)
> 
> 
> Problematic principal details :
> [storm@cbro-test-stm1 ~]$ klist -f
> Ticket cache: FILE:/tmp/krb5cc_1021
> Default principal: storm-_mas...@xx.com 
> 
> 
> Valid starting   Expires  Service principal
> 01/06/2018 22:30:40  01/07/2018 08:30:40  krbtgt/xx@xx.com 
> 
> renew until 01/12/2018 13:54:47, Flags: FRIAT
> 
> 
> 
> Working principal details :
> [metron@cbro-test-edg4 ~]$ klist -f
> Ticket cache: FILE:/tmp/krb5cc_1024
> Default principal: met...@xx.com 
> 
> Valid starting   Expires  Service principal
> 01/09/2018 15:28:47  01/10/2018 01:28:47  krbtgt/xx@xx.com 
> 
> renew until 01/16/2018 15:28:47, Flags: FRIA
> 
> 
> Regards,
> Prakash R
> 



Re: how to keep storm running when console ends

2017-08-04 Thread Ethan Li
That makes sense to me. I just don't know why I didn't receive Bobby's email in 
this thread. 

On Friday, August 4, 2017 10:34 AM, I PVP <i...@hotmail.com> wrote:
 

  #yiv9409744246 body{font-family:Helvetica, Arial;font-size:13px;}Hi J. R. 
Pauley. 
If that helps :  I use supervisord not only for Storm but also for other 
solutions within the ecosystem.
- IP VP 

On August 4, 2017 at 12:29:41 PM, Bobby Evans (ev...@yahoo-inc.com) wrote:

Java by default also tries to keep running if something unexpected happens.  If 
an exception is not caught and goes all the way up the stack, that thread will 
just print out a stack trace to stderr and then quietly stop running.  Do you 
know how many zombie daemons I have had to debug/fix because something slightly 
unexpected happened on a critical thread.  I personally added in a generic 
uncaught exception handler to all of the Hadoop daemons years ago because it 
was happening all of the time in the early days of YARN.  Storm is almost 100% 
immune to those situations, and has been since the beginning.
That is not to say that storm is not stable.  On most of our clusters the only 
time nimbus or a supervisor goes down is when we upgrade it.  We have had 
daemons running for months.  And if they do go down they typically recover in a 
few seconds, and my team gets an alert so we can debug what happened.  Running 
storm (and I would argue any critical daemon) under supervision for production 
environments is just a best practice. 
For debugging/testing something then go ahead and use nohup.

- Bobby


On Friday, August 4, 2017, 9:43:10 AM CDT, J.R. Pauley <jrpau...@gmail.com> 
wrote:

thanks for all the responses. screen and nohup seem easy to adopt. 
I guess for something intended to run on large clusters I'm surprised at the 
default behavior of storm. It seems quite the opposite of what I would've 
guessed. 
I also have a Heron instance running in similar fashion and it's default 
behavior is to keep running which seems more what I expected.
On Fri, Aug 4, 2017 at 9:28 AM, M. Aaron Bossert<maboss...@gmail.com> wrote:

Try adding nohup at the beginning of each of those commands.  Nohup prevents a 
process from terminating when it's parent shell is closed.
nohup /opt/storm/apache*/bin/ storm nimbus &

Sent from my iPhone
On Aug 4, 2017, at 09:08, Ethan Li <etha...@yahoo-inc.com> wrote:


Hi,
Storm docs recommend to use daemontools or monit which provides error recovery 
and etc.. 
I use linux "nohup" or "screen" command for simplicity. 
Ethan 


On Friday, August 4, 2017 7:29 AM, J.R. Pauley <jrpau...@gmail.com> wrote:


this seems silly but I have not figured out how to keep nimbus, supervisor 
running after console session ends. zookeeper survives but supervisor and 
nimbus shut down when the console session ends.
I have storm 1.0.2 installed under /opt/storm and starting up 
as:/opt/storm/apache*/bin/storm nimbus&/opt/storm/apache*/bin/storm supervisor&
/opt/storm/apache*/bin/storm ui&
/opt/storm/apache*/bin/storm drpc&

I know I can add a while loop to keep the console active but there has to be a 
better way






   

Re: how to keep storm running when console ends

2017-08-04 Thread Ethan Li
Hi,
Storm docs recommend to use daemontools or monit which provides error recovery 
and etc.. 
I use linux "nohup" or "screen" command for simplicity. 
Ethan 
 

On Friday, August 4, 2017 7:29 AM, J.R. Pauley  wrote:
 

 this seems silly but I have not figured out how to keep nimbus, supervisor 
running after console session ends. zookeeper survives but supervisor and 
nimbus shut down when the console session ends.
I have storm 1.0.2 installed under /opt/storm and starting up 
as:/opt/storm/apache*/bin/storm nimbus&/opt/storm/apache*/bin/storm supervisor&
/opt/storm/apache*/bin/storm ui&
/opt/storm/apache*/bin/storm drpc&

I know I can add a while loop to keep the console active but there has to be a 
better way

   

Re: Question about scheduler.display.resource option.

2017-07-17 Thread Ethan Li
Any help with this? Thanks. Ethan Li
Tech Yahoo, Software Dev Eng, Interm1908 S. First St. Champaign, IL 
61820etha...@yahoo-inc.com
Mobile:  812-361-1879 

On Thursday, July 13, 2017 3:44 PM, Ethan Li <etha...@yahoo-inc.com> wrote:
 

 Hello,
I am facing with a really strange issue about the "scheduler.display.resource" 
option. I have a storm secure cluster with only one machine node. I was using 
one nimbus and one supervisor. I ran the same WordCountTopology example in the 
following experiments and I checked the "Assigned memory (MB)" on storm UI. The 
results are as follows:
scheduler  |  scheduler.display.resource  | Assigned Memory default      |     
true                                      |     0
 default      |     false                                     |    164 RAS      
   |     true                                      |    896 RAS         |     
false                                     |    328
*RAS=ResourceAwareScheduler

I clean up all the logs, storm-local, zookeeper and restart all daemons every 
time I change the configuration.
Did I do something wrong? Have you seen this problem before? Is the 
"scheduler.display.resource" supposed to influence the assigned memory? 
Thanks very much!

Best, Ethan Li
Tech Yahoo, Software Dev Eng, Interm1908 S. First St. Champaign, IL 
61820etha...@yahoo-inc.com
Mobile:  812-361-1879