Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22138 @koeninger I'm not sure but are you saying that an executor cares about multiple queries (multiple jobs) concurrently? I honestly didn't notice it. If that is going to be problem, we should add something (could we get query id at that time?) in cache key to differentiate consumers. If we want to avoid extra seeking due to different offsets, consumers should not be reused among with multiple queries, and that's just a matter of cache key. If you are thinking about co-use of consumers among multiple queries because of reusing connection to Kafka, I think extra seeking is unavoidable (I guess fetched data should be much more critical issue unless we never reuse after returning to pool). If seeking is light operation, we may even go with only reusing connection (not position we already sought): always resetting position (and data maybe?) when borrowing from pool or returning consumer to pool. Btw, the rationalization of this patch is not solving the issue you're referring. This patch is also based on #20767 but dealing with another improvements pointed out in comments: adopt pool library to not reinvent the wheel, and also enabling metrics regarding the pool. I'm not sure the issue you're referring is a serious one (show-stopper): if the issue is a kind of serious, someone should handle the issue once we are aware of the issue at March, or at least relevant JIRA issue should be filed with detailed explanation before. I'd like to ask you in favor of handling (or filing) the issue since you may know the issue best.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org