Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/22138
  
    @koeninger 
    
    I'm not sure but are you saying that an executor cares about multiple 
queries (multiple jobs) concurrently? I honestly didn't notice it. If that is 
going to be problem, we should add something (could we get query id at that 
time?) in cache key to differentiate consumers. If we want to avoid extra 
seeking due to different offsets, consumers should not be reused among with 
multiple queries, and that's just a matter of cache key.
    
    If you are thinking about co-use of consumers among multiple queries 
because of reusing connection to Kafka, I think extra seeking is unavoidable (I 
guess fetched data should be much more critical issue unless we never reuse 
after returning to pool). If seeking is light operation, we may even go with 
only reusing connection (not position we already sought): always resetting 
position (and data maybe?) when borrowing from pool or returning consumer to 
pool.
    
    Btw, the rationalization of this patch is not solving the issue you're 
referring. This patch is also based on #20767 but dealing with another 
improvements pointed out in comments: adopt pool library to not reinvent the 
wheel, and also enabling metrics regarding the pool.
    
    I'm not sure the issue you're referring is a serious one (show-stopper): if 
the issue is a kind of serious, someone should handle the issue once we are 
aware of the issue at March, or at least relevant JIRA issue should be filed 
with detailed explanation before. I'd like to ask you in favor of handling (or 
filing) the issue since you may know the issue best.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to