Re: When is the MAX_SPOUT_PENDING limit applied

2019-04-03 Thread Alessio Pagliari
Hi Jayant,

The MAX_SPOUT_PENDING is applied in each of the spouts. 

Each spout take count of all the tuples it sends, increasing it for each emit 
and decreasing it when it receive the final ack message for a “pending tuple” 
(= a tuple that is sent but it’s still waiting an ack). 
If the number of pending tuples raises above the value you set, let’s say 500, 
the spouts will automatically slow down, until they reach an emission rate such 
that the number of pending tuples stabilise under 500.
This mechanism is applied per spout, meaning that each spout can be limited in 
different times, and the MAX_SPOUT_PENDING represents the maximum number of 
non-acked tuples that a single spout will allow.

Alessio

> On 3 Apr 2019, at 14:45, Joshua Martell  wrote:
> 
> I’m pretty sure it’s per spout task. And they’re not coordinated. Each task 
> gets its own count. 
> 
> Joshua
> On Wed, Apr 3, 2019 at 4:50 AM Jayant Sharma  > wrote:
> Hi,
> 
> Can someone please explain at what point storm checks and applies 
> MAX_SPOUT_PENDING limit. Also, is this limit applied per executor(or Task if 
> that's the case) of spout or aggregated and applied over all the executors of 
> spout.
> Suppose I have 3 executors and 3 tasks of spouts, each of them fetch 1 
> message from input source. If my MAX_SPOUT_PENDING is 2, will only 2 
> executors be able to send the tuple forward or all 3 will send and further 
> nextTuple calls on all will be blocked?
> 
> Thanks,
> Jayant Sharma



Modifying tasks number during scheduling

2018-09-19 Thread Alessio Pagliari
Hi guys,

I was playing a bit with a Custom Scheduler to understand how it works. 
I was wondering, is there a way to change the number of tasks for a certain 
component from the Scheduler class? Like for example if during the scheduling 
I'd like to increase or decrease the number of the spouts? 
I cannot find any API that allows me to do that.

I don’t like the idea of having to change the storm code in the deep, even 
because I don’t know Clojure and I’m not using 2.0.0 yet. 

Thank you,

Alessio



Re: Display ackers executors

2018-07-20 Thread Alessio Pagliari
It works, thanks.

Alessio

> On 20 Jul 2018, at 15:39, Ethan Li  wrote:
> 
> Please try 
> http://localhost:8080/api/v1/topology/my-topology-id/component/__acker 
> <http://localhost:8080/api/v1/topology/my-topology-id/component/__acker>?sys=true
>  
> <https://fubariteblue-ni.blue.ygrid.yahoo.com:4443/api/v1/topology/cb_reporting_cow_updates-612-1531172361/component/__acker?sys=true>
>  
> 
> 
> 
>> On Jul 19, 2018, at 11:50 AM, Alessio Pagliari > <mailto:pagli...@i3s.unice.fr>> wrote:
>> 
>> Thank you Ethan, I totally missed that button. 
>> 
>> Is there also a way to retrieve information about ‘__acker’ through REST 
>> APIs, like any other component of the topology? Calling 
>> http://localhost:8080/api/v1/topology/my-topology-id/component/__acker 
>> <http://localhost:8080/api/v1/topology/my-topology-id/component/__acker> 
>> doesn’t give me the same output as other components.
>> 
>> Cheers,
>> 
>> Alessio
>> 
>> 
>>> On 19 Jul 2018, at 18:20, Ethan Li >> <mailto:ethanopensou...@gmail.com>> wrote:
>>> 
>>> At the bottom of the topology page, you can find “Show System Stats” 
>>> button. Click on it and you will see __acker.
>>> 
>>> Best,
>>> Ethan
>>> 
>>>> On Jul 19, 2018, at 11:04 AM, Alessio Pagliari >>> <mailto:pagli...@i3s.unice.fr>> wrote:
>>>> 
>>>> Hello everybody,
>>>> 
>>>> I cannot find a way to show in the UI or retrieve through REST APIs the 
>>>> placement of the acker executors. Is there a way to get this information? 
>>>> I remember that in some topologies in the past I was able to see them in 
>>>> the UI with the name ‘__acker’, but now I can’t understand how to display 
>>>> them again.
>>>> 
>>>> Thank you,
>>>> 
>>>> Alessio
>>>> 
>>> 
>> 
> 



Re: Display ackers executors

2018-07-19 Thread Alessio Pagliari
Thank you Ethan, I totally missed that button. 

Is there also a way to retrieve information about ‘__acker’ through REST APIs, 
like any other component of the topology? Calling 
http://localhost:8080/api/v1/topology/my-topology-id/component/__acker 
<http://localhost:8080/api/v1/topology/my-topology-id/component/__acker> 
doesn’t give me the same output as other components.

Cheers,

Alessio


> On 19 Jul 2018, at 18:20, Ethan Li  wrote:
> 
> At the bottom of the topology page, you can find “Show System Stats” button. 
> Click on it and you will see __acker.
> 
> Best,
> Ethan
> 
>> On Jul 19, 2018, at 11:04 AM, Alessio Pagliari  wrote:
>> 
>> Hello everybody,
>> 
>> I cannot find a way to show in the UI or retrieve through REST APIs the 
>> placement of the acker executors. Is there a way to get this information? I 
>> remember that in some topologies in the past I was able to see them in the 
>> UI with the name ‘__acker’, but now I can’t understand how to display them 
>> again.
>> 
>> Thank you,
>> 
>> Alessio
>> 
> 



Display ackers executors

2018-07-19 Thread Alessio Pagliari
Hello everybody,

I cannot find a way to show in the UI or retrieve through REST APIs the 
placement of the acker executors. Is there a way to get this information? I 
remember that in some topologies in the past I was able to see them in the UI 
with the name ‘__acker’, but now I can’t understand how to display them again.

Thank you,

Alessio



New Metrics2 error in documentation

2018-05-24 Thread Alessio Pagliari
Hello,

While trying the new Metrics2 in Storm 1.2.1, I noticed an error in the online 
documentation (http://storm.apache.org/releases/1.2.1/metrics_v2.html 
): 

> 
> CSV Reporter (org.apache.storm.metrics2.reporters.CsvReporter): Reports 
> metrics to a CSV file.
> 

The csv reporter class is not org.apache.storm.metrics2.reporters.CsvReporter 
but it should be org.apache.storm.metrics2.reporters.CsvStormReporter 
(https://storm.apache.org/releases/1.2.1/javadocs/org/apache/storm/metrics2/reporters/CsvStormReporter.html
 
)

I don’t know how to edit it so I just followed the hint in the documentation 
itself and sent an email here.

Thank you,

Alessio




Storm benchmarks: UI stat updates

2018-04-20 Thread Alessio Pagliari
Hi everybody,

I’m doing some benchmarking to check the behaviour of Storm over time, so I’m 
just retrieving each 10 seconds the total amount of tuples emitted per 
executor, via the UI APIs, and from that I compute the last 10 seconds value. I 
noticed that pretty regularly I obtain a peak in the measurements, for example: 
all the velues are around 500k tuples/10s, but sometime I get a value of ~1M 
tuples/10s. 

I wanted to know how the UI stats are updated with which frequency, and if it’s 
normal that I get this outliers in the measurements. Does someone have an 
explanation? Or a link where I can deepen this aspect.

Thank you,

--
Alessio Pagliari
Scale/Signet Team, PhD Student
Université Côte d’Azur, CNRS, I3S
Website: http://www.i3s.unice.fr/~pagliari/ <http://www.i3s.unice.fr/~pagliari/>





Re: Running multiple topologies

2018-04-12 Thread Alessio Pagliari
Hi Daniel,

I’ll try to answer at my best. 

Basically storm has a default scheduler that perform a round robin placement, 
when in a topology you define the number of workers (aka jvms) it will first 
place them in all the nodes you have in your cluster in a cyclic way (round 
robin), then with the same algorithm it will place the executors in the workers 
(basically the workers become the nodes of the situation). Storm has an 
algorithm to order the nodes, the slots in each node, and the executors, and it 
will use that order for every placement. It’s quite deterministic if the 
available nodes are always the same.

That means that if for example you have 3 nodes and 4 workers, the 1st worker 
in the 1st node, the 2nd in the 2nd, the 3rd in the 3rd and the 4th worker will 
be placed back inside the first node. Making you to have 2 workers in the first 
node and 1 in the others. If you deploy a second application it will follow the 
same pattern starting always from the 1st node, then the 2nd and so on. That is 
until you don’t finish the available slots in the 1st node, from that point on, 
storm will start the placement from the 2nd worker.

You have also to consider one thing, that may be obvious, but just to be clear: 
a worker cannot be shared between multiple topologies, meaning that if you have 
one worker with only one executor in it, you can’t place executors from other 
topologies in it. Which means that you cannot move a JVM from a topology A to a 
topology B, unless you kill (I’m not sure if you just need to disable it) the 
topology A (the first you deployed), you “rebalance” the topology B and 
redeploy the topology A, in that way you will have the node slots switched 
between the two topologies.

If you’re interested in improving resource management (in terms of CPU and RAM) 
and pure throughput performances of your application, without considering the 
advantages of a distributed system and load balancing your topology 
cluster-wide, you can use the Resource Aware Scheduler (RAS), as suggested by 
Ethan. However, the RAS focus on locality, to avoid tuples to go around in the 
network, making the placement in order to put all the workers and executors 
more near to each other as possible. Most of the times you will have the entire 
topology inside the same node. So a possible scenario is having 5 nodes in a 
cluster with 5 topologies deployed each one on a different machine, in fact 
differently from the above scheduler, RAS start placing in the machine with 
most resources available.

As of Storm 2.0 I know that they’re improving the RAS, but unfortunately I stil 
didn’t dig to understand how.

Hope I answered your question,


--
Alessio Pagliari
Scale Team, PhD Student
Université Côte d’Azur, CNRS, I3S



> On 12 Apr 2018, at 20:51, Ethan Li <ethanopensou...@gmail.com> wrote:
> 
> Hi Daniel,
> 
> I am not sure if I understand your questions correctly. But will the resource 
> aware scheduler help? 
> https://github.com/apache/storm/blob/master/docs/Resource_Aware_Scheduler_overview.md
>  
> <https://github.com/apache/storm/blob/master/docs/Resource_Aware_Scheduler_overview.md>
> 
> 
> Thanks
> Ethan
> 
> 
>> On Apr 12, 2018, at 1:45 PM, Hannum, Daniel <daniel_han...@premierinc.com 
>> <mailto:daniel_han...@premierinc.com>> wrote:
>> 
>> I think I know the answer, but I can’t find docs, so check my thinking 
>> please.
>>  
>> We’re going to be going from 1 topology to maybe 5 on our (v1.1.x) cluster. 
>> I think the way I share the cluster is by setting NumWorkers() on all the 
>> topologies so we divide the available JVM’s up. If that is true, then don’t 
>> we have the problem that I’m tying up resources if one is idle? Or that I 
>> can’t move a JVM from topology A to topology B if B is under load?
>>  
>> So my questions are: 
>> Do I understand this correctly?
>> Is there any way to improve this situation besides just getting more hosts 
>> and being ok with some portion of them being idle?
>> Is this going to get better in Storm 2?
>>  
>> Thanks!
> 



Re: Storm throughput

2018-03-30 Thread Alessio Pagliari
Surely they work on a way more powerful cluster, but the topology is composed 
by just one spout. No parallelization, no bolts, for a total of one worker, so 
1 thread in a jvm. Even if I had 100 cores like them it shouldn't make any 
difference. Please, correct me if I'm wrong.

Such a topology will assign it's only spout to a worker in a node: so, the 
multi-node cluster is pointless. Meanwhile, regarding the number of cores, one 
executor cannot be at the same time on multiple cores, not being a multi-thread 
process.

Is there some Storm or Java behavior that I'm not aware of?

Thank you,

Alessio

⁣Sent from BlueMail ​

On Mar 30, 2018, 4:28 PM, at 4:28 PM, Jacob Johansen <johansenj...@gmail.com> 
wrote:
>for their test, they were using 4 worker nodes (servers) each with
>24vCores
>for a total of 96vCores.
>Most laptops max out at 8vCores and are typically at 4-6vCores
>
>Jacob Johansen
>
>On Fri, Mar 30, 2018 at 9:18 AM, Alessio Pagliari
><pagli...@i3s.unice.fr>
>wrote:
>
>> Hi everybody,
>>
>> I’m trying to do some preliminary tests with storm, to understand how
>far
>> it can go. Now I’m focusing on trying to understand which is his
>maximum
>> throughput in terms of tuples per second. I saw the benchmark done by
>the
>> guys at Hortonworks (ref: https://it.hortonworks.
>> com/blog/microbenchmarking-storm-1-0-performance/) and in the first
>test
>> they reach a spout emission rate of 3.2 million tuples/s.
>>
>> I tried to replicate the test, a simple spout that emits continuously
>the
>> same string “some data”. Differently from them, I’m using Storm 1.1.1
>and
>> the storm cluster is set up on my laptop, anyway I’m just testing one
>spout
>> not an entire topology, but if you think that more configuration
>> information are needed, just ask.
>>
>> To compute the throughput I ask the total amount of tuples processed
>to
>> the UI APIs each 10s and I subtract it by the previous measure to
>have the
>> amount of tuples int the last 10s. What the mathematics give to me is
>> something around 32k tuples/s.
>>
>> I don’t think to be wrong saying that 32k is not even comparable to
>3.2
>> million. Is there something that I’m missing? Is it normal this
>output?
>>
>> Thank you for your help and for your time,
>>
>> Alessio
>>


Storm throughput

2018-03-30 Thread Alessio Pagliari
Hi everybody,

I’m trying to do some preliminary tests with storm, to understand how far it 
can go. Now I’m focusing on trying to understand which is his maximum 
throughput in terms of tuples per second. I saw the benchmark done by the guys 
at Hortonworks (ref: 
https://it.hortonworks.com/blog/microbenchmarking-storm-1-0-performance/ 
) and 
in the first test they reach a spout emission rate of 3.2 million tuples/s. 

I tried to replicate the test, a simple spout that emits continuously the same 
string “some data”. Differently from them, I’m using Storm 1.1.1 and the storm 
cluster is set up on my laptop, anyway I’m just testing one spout not an entire 
topology, but if you think that more configuration information are needed, just 
ask. 

To compute the throughput I ask the total amount of tuples processed to the UI 
APIs each 10s and I subtract it by the previous measure to have the amount of 
tuples int the last 10s. What the mathematics give to me is something around 
32k tuples/s.

I don’t think to be wrong saying that 32k is not even comparable to 3.2 
million. Is there something that I’m missing? Is it normal this output?

Thank you for your help and for your time,

Alessio

Re: Resource Aware Scheduler Kills Nimbus

2018-02-19 Thread Alessio Pagliari
Hi Wuyang,

This is a nice question, I didn’t find anything that computes it automatically, 
yet. By now you have to test your topology tuning the resource requirements 
until you don’t find pleasing values.

Cheers,

Alessio

> On 10 Feb 2018, at 21:42, Wuyang Zhang  wrote:
> 
> Hi Jerry, 
> 
> I am not familiar with the pull operation on the Storm website... Can you do 
> it when you get a chance?
> 
> Also, I wonder how can I accurately specify the resource requirement, 
> especially the cpu demand, for each task?
> 
> Best, 
> Wuyang
> ᐧ
> 
> On Sat, Feb 10, 2018 at 3:21 PM, Jerry Peng  > wrote:
> Yup, an error in the documentation
> 
> Feel free to submit a pull request to fix the documentation.
> 
> Best,
> 
> Jerry
> 
> On Sat, Feb 10, 2018 at 1:42 PM, Wuyang Zhang  > wrote:
> Ok, I find it works by adding 
> storm.scheduler: org.apache.storm.scheduler.resource.ResourceAwareScheduler
> 
> instead of adding 
> 
> storm.scheduler: “org.apache.storm.scheduler.re 
> source.ResourceAwareScheduler”
> 
> 
> suggested by the website..
> 
> ᐧ
> 
> On Sat, Feb 10, 2018 at 2:22 PM, Wuyang Zhang  > wrote:
> When I redo the procedure, here is the more log
> 
> 2018-02-10 14:10:32.791 o.a.s.d.nimbus main [INFO] Using custom scheduler: 
> “org.apache.storm.scheduler.re 
> source.ResourceAwareScheduler”
> 2018-02-10 14:10:32.793 o.a.s.d.nimbus main [ERROR] Error on initialization 
> of server service-handler
> java.lang.ClassNotFoundException: “org.apache.storm.scheduler.re 
> source.ResourceAwareScheduler”
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
> ~[?:1.8.0_151]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_151]
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) 
> ~[?:1.8.0_151]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_151]
>   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_151]
>   at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_151]
>   at org.apache.storm.util$new_instance.invoke(util.clj:1027) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at org.apache.storm.daemon.nimbus$mk_scheduler.invoke(nimbus.cl 
> j:127) ~[storm-core-1.1.1.jar:1.1.1]
>   at org.apache.storm.daemon.nimbus$nimbus_data.invoke(nimbus.clj:215) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at 
> org.apache.storm.daemon.nimbus$fn__11007$exec_fn__1370__auto11008.invoke(nimbus.clj:2451)
>  ~[storm-core-1.1.1.jar:1.1.1]
>   at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.7.0.jar:?]
>   at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.7.0.jar:?]
>   at clojure.core$apply.invoke(core.clj:630) ~[clojure-1.7.0.jar:?]
>   at 
> org.apache.storm.daemon.nimbus$fn__11007$service_handler__11040.doInvoke(nimbus.clj:2448)
>  ~[storm-core-1.1.1.jar:1.1.1]
>   at clojure.lang.RestFn.invoke(RestFn.java:421) ~[clojure-1.7.0.jar:?]
>   at 
> org.apache.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:2536) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at org.apache.storm.daemon.nimbus$_launch.invoke(nimbus.clj:2569) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at org.apache.storm.daemon.nimbus$_main.invoke(nimbus.clj:2592) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.7.0.jar:?]
>   at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.7.0.jar:?]
>   at org.apache.storm.daemon.nimbus.main(Unknown Source) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 2018-02-10 14:10:32.806 o.a.s.util main [ERROR] Halting process: ("Error on 
> initialization")
> java.lang.RuntimeException: ("Error on initialization")
>   at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at clojure.lang.RestFn.invoke(RestFn.java:423) ~[clojure-1.7.0.jar:?]
>   at 
> org.apache.storm.daemon.nimbus$fn__11007$service_handler__11040.doInvoke(nimbus.clj:2448)
>  ~[storm-core-1.1.1.jar:1.1.1]
>   at clojure.lang.RestFn.invoke(RestFn.java:421) ~[clojure-1.7.0.jar:?]
>   at 
> org.apache.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:2536) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at org.apache.storm.daemon.nimbus$_launch.invoke(nimbus.clj:2569) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at org.apache.storm.daemon.nimbus$_main.invoke(nimbus.clj:2592) 
> ~[storm-core-1.1.1.jar:1.1.1]
>   at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.7.0.jar:?]
>   at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.7.0.jar:?]
>   at org.apache.storm.daemon.nimbus.main(Unknown Source) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 
> I download a