Re: Storm throughput

Alessio Pagliari Wed, 04 Apr 2018 03:28:55 -0700

> Something is definitely broken in your run or in your measurement method….


The problem doesn’t lie in my measurement method, I double checked trying as 
you said. Thank you for sharing the topo used, with that I was able to 
understand where I was failing. Basing my topo on other sample benchmark 
topologies that I found online, I enabled the option setDebug(true); printing 
an output message for each tuple was slowing me down, now I’m able to reach a 
spout emission rate of ~4.5M tuples per second.

Thank you all for the support.

----------
Alessio Pagliari
Scale Team, PhD Student
Université Côte d’Azur, CNRS, I3S



> On 31 Mar 2018, at 07:19, Roshan Naik <[email protected]> wrote:
> 
> 
> Something is definitely broken in your run or in your measurement method.... 
> and its not your hardware that is at fault. The machine on which those 
> numbers were run had lots of cores but the cores were not fast at all. Even 
> my mid 2015 macbook pro has faster cores than that machine which had old 
> Intel CPUs.
> 
> You maybe making some mistakes in your calculations. Just run the topo for 
> about 14 mins and take the 10 min window reading directly from the UI and 
> calculate the per sec throughput from that. (that way you disregard the first 
> 3 or 4mins to allow for warm up). Also are you overriding any default 
> settings ?
> 
> 
> Here is the code for the topo that was used :  
> https://github.com/apache/storm/blob/1.1.x-branch/examples/storm-perf/src/main/java/org/apache/storm/perf/ConstSpoutOnlyTopo.java
>  
> <https://github.com/apache/storm/blob/1.1.x-branch/examples/storm-perf/src/main/java/org/apache/storm/perf/ConstSpoutOnlyTopo.java>
>  
> 
> 
> 
> -roshan
> On Friday, March 30, 2018, 8:24:39 AM PDT, Alessio Pagliari 
> <[email protected]> wrote:
> 
> 
> Surely they work on a way more powerful cluster, but the topology is composed 
> by just one spout. No parallelization, no bolts, for a total of one worker, 
> so 1 thread in a jvm. Even if I had 100 cores like them it shouldn't make any 
> difference. Please, correct me if I'm wrong.
> 
> Such a topology will assign it's only spout to a worker in a node: so, the 
> multi-node cluster is pointless. Meanwhile, regarding the number of cores, 
> one executor cannot be at the same time on multiple cores, not being a 
> multi-thread process. 
> 
> Is there some Storm or Java behavior that I'm not aware of?
> 
> Thank you,
> 
> Alessio
> 
> Sent from BlueMail <http://www.bluemail.me/r?b=12512>
> On Mar 30, 2018, at 4:28 PM, Jacob Johansen <[email protected] 
> <mailto:[email protected]>> wrote:
> for their test, they were using 4 worker nodes (servers) each with 24vCores 
> for a total of 96vCores.
> Most laptops max out at 8vCores and are typically at 4-6vCores 
> 
> Jacob Johansen 
> 
> On Fri, Mar 30, 2018 at 9:18 AM, Alessio Pagliari <[email protected] 
> <mailto:[email protected]>> wrote: 
> Hi everybody,
> 
> I’m trying to do some preliminary tests with storm, to understand how far it 
> can go. Now I’m focusing on trying to understand which is his maximum 
> throughput in terms of tuples per second. I saw the benchmark done by the 
> guys at Hortonworks (ref:  https://it.hortonworks. 
> com/blog/microbenchmarking- storm-1-0-performance/ 
> <https://it.hortonworks.com/blog/microbenchmarking-storm-1-0-performance/>) 
> and in the first test they reach a spout emission rate of 3.2 million 
> tuples/s. 
> 
> I tried to replicate the test, a simple spout that emits continuously the 
> same string “some data”. Differently from them, I’m using Storm 1.1.1 and the 
> storm cluster is set up on my laptop, anyway I’m just testing one spout not 
> an entire topology, but if you think that more configuration information are 
> needed, just ask. 
> 
> To compute the throughput I ask the total amount of tuples processed to the 
> UI APIs each 10s and I subtract it by the previous measure to have the amount 
> of tuples int the last 10s. What the mathematics give to me is something 
> around 32k tuples/s.
> 
> I don’t think to be wrong saying that 32k is not even comparable to 3.2 
> million. Is there something that I’m missing? Is it normal this output?
> 
> Thank you for your help and for your time,
> 
> Alessio
>

Re: Storm throughput

Reply via email to