Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/22138
  
    I thought the whole reason the caching was changed from the initial naive
    approach to the current approach in master was that people were running
    jobs that were scheduling multiple consumers for the same topicpartition
    and group.
    
    
    
    On Sun, Aug 19, 2018 at 7:51 PM, Jungtaek Lim <[email protected]>
    wrote:
    
    > @koeninger <https://github.com/koeninger>
    > I'm not sure I got your point correctly. This patch is based on some
    > assumptions, so please correct me if I'm missing here. Assumptions follow:
    >
    >    1.
    >
    >    There's actually no multiple consumers for a given key working at the
    >    same time. The cache key contains topic partition as well as group id. 
Even
    >    the query tries to do self-join so reading same topic in two different
    >    sources, I think group id should be different.
    >    2.
    >
    >    In normal case the offset will be continuous, and that's why cache
    >    should help. In retrying case this patch invalidates cache as same as
    >    current behavior, so it should start from scratch.
    >
    > (Btw, I'm curious what's more expensive between leveraging pooled object
    > but resetting kafka consumer vs invalidating pooled objects and start
    > from scratch. Latter feels more safer but if we just need extra seek
    > instead of reconnecting to kafka, resetting could be improved and former
    > will be cheaper. I feel it is out of scope of my PR though.)
    >
    > This patch keeps most of current behaviors, except two spots I guess. I
    > already commented a spot why I change the behavior, and I'll comment
    > another spot for the same.
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/22138#issuecomment-414164788>, or 
mute
    > the thread
    > 
<https://github.com/notifications/unsubscribe-auth/AAGAB8x3Khz4bWIxphLJHWFvcc8H4ERyks5uSfnvgaJpZM4WCUJs>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to