Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21500
  
    @TomaszGaweda @aalobaidi 
    Please correct me if I'm missing here.
    
    From every start of batch, state store loads previous version of state so 
that it can be read and written. If we unload all the version "after 
committing" the cache will no longer contain previous version of state and it 
will try to load the state via reading files, adding huge latency on starting 
batch. That's why I stated about three cases before to avoid loading state from 
files when starting a new batch.
    
    Please apply #21469 manually and see how much HDFSBackedStateStoreProvider 
consumes memory due to storing multiple versions (it will show the state size 
on the latest version as well as overall state size in cache). Please also 
observe and provide numbers of latency to show how much it is and how much it 
will be after the patch. We always have to ask ourselves that we are addressing 
the issue correctly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to