itskals commented on pull request #29413: URL: https://github.com/apache/spark/pull/29413#issuecomment-673475255
I was thinking that though spark has many queues and in many cases may be not all queues are used used to the same level at the same time.. I mean, when some queues are heavily used, some might be consume those events faster. If this is true, instead of having separate queue sizes for each and making it rigid, what if its a pool, from which the queue can loan event holders. and when the events are processed, they are given back to the pool. This pool is not memory units stuff, rather its just a counter (atomic probably). Let's say if we say for a driver memory of X GB, we allocate N event holders, which needs to be used by all queues. N then is the pool size. When an event needs to be enqueued in a queue, ask the pool if an event placeholder can be used. If it says ok ( based on the current value of usage) then the queue can enqueue event. if no, the event is dropped. The idea here is that it is a middle ground between restricted queue size and infinite capacity queue. So here the queues are not statically bound to a size but rather more flexibility of given to grow. Also it has a softer high water mark ( which is N) beyond which it cant grow. Let me know what you think... @SaurabhChawla100 @Ngone51 @tgravescs ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
