@Vijay: Memory is static. It is not per-bolt, it is just loading some large 
datasets. Refactoring is certainly an option but will require plenty of code 
changes and will also cause a lot of data to be transferred over wire.



@John   That is correct. But, the scenario is a bit different. I have not 
tested but I am not sure how will it work.

Lets say if I set 20 as parallelism on the ‘HighTensionBolt’ bolt and 20 
workers and each machine with exactly 1 slot. But if I have 10 other bolts, 
there is no way to ensure that all 20 instance of HighTensionBolts are 
distributed eveninly.



Thanks
-Abhishek

Sent from Mail for Windows 10

From: [email protected]
Sent: Thursday, April 21, 2016 4:18 AM
To: [email protected]
Subject: Re: How to control Affinity of Executor to a Worker

That's interesting. When I match the number of executors to the number of 
workers, I always get exactly one executor per worker, at least with versions 
0.9.4-0.9.6. 

--John

Sent from my iPhone

On Apr 21, 2016, at 12:19 AM, Vijay Patil <[email protected]> wrote:
If your topology is the only topology running on 20 node cluster, then I think 
you can reduce slots per supervisor to "1" by setting "supervisor.slots.ports" 
(mention just 1 port number there) in storm.yaml. If you are running multiple 
topologies on this 20 node cluster then this solution may not work, need to 
think of something else like writing our own meta-data aware custom scheduler 
by implementing backtype.storm.scheduler.IScheduler.

But I think there can be some scope for refactoring that HishTensionBolt in 
order to reduce memory usage. All the memory used by that bolt remains static 
for every tuple? Or it's "execute()" method which consumes that much memory 
each time and discards it once tuple processing is done? 

On 21 April 2016 at 06:47, <[email protected]> wrote:
 
Hi,
 
I have setup 4 workers on each machine. In my Topology, there is one bolt which 
needs a lot of memory, so ideally, I don’t want it to schedule more than 1 of 
that on any machines. In terms of computation it is pretty fast so it can 
manage good throughput when running. Lets call it, BoltHighTension.
But, my other bolts are very light weight and I can have a lot of parallelism 
on that. 
 
How do I ensure that if I have 20 Supervisors, I don’t have more than 1 
‘BoltHighTension’ on each machine? I want to give parallelism hint of 20 to 
this bolt.
But, I notice that sometimes, more than 1 such instance gets allocated on same 
machine. (Machine can handle 2, but the performance due to paging becomes a 
problem).
 
Thanks for your help/advice/hints.
 
Thanks
-Abhishek
 
Sent from Mail for Windows 10
 
 


Reply via email to