HeartSaVioR commented on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore 
checkpoint to speed up replay log file in HistoryServer
URL: https://github.com/apache/spark/pull/25577#issuecomment-525578717
 
 
   Thanks for pointing out about "recover live entities". I've commented in 
your comment in the doc, but elaborate again to get some guides about what we 
should do from experts for event logging part.
   
   I thought `live*` fields are working as "cache" (code comment made me a bit 
confused). If it's designed as a "cache" instead of playing as truth of source, 
it should also load from KVStore if the entity is not in cache, and fill the 
cache if found. Looks like it's not: AppStatusListener doesn't seek KVStore if 
it can't find the entity from live* fields.
   
   Some part of SQLAppStatusListener deals with that, though it doesn't deal 
with cache behavior entirely.
   
https://github.com/apache/spark/blob/8848af263570a82a985037a95468efff050c61b0/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L79-L101
   
   It seems that AppStatusListener always assumes KVStore is empty and we're 
replaying all events. As we're breaking the assumption, at least as of now I'd 
say it's a bug and we should fix the bug, regardless of how to snapshot - once 
we load KVStore and provide to AppStatusListener we should fix.
   
   "initLiveEntities" in your PR seems to deal with this, so hopefully you can 
create smaller PR to fix the bug first. (You could test the functionality 
without snapshotting, as once you know the type of entities you could retrieve 
entities from KVStore.) 
   
   One thing I would like to modify is, I don't think we should initialize only 
for certain condition. We should just initialize all the time for startup of 
AppStatusListener. It should be fast enough if KVStore is empty, and it must be 
done regardless of flags if KVStore is not empty - correctness matters, speed 
doesn't matter here. SQLAppStatusListener may also need to be fixed as well.
   
   Once liveEntities can be restored from KVStore, it would also work for 
incremental replaying, as well as incremental snapshotting (replaying some more 
events and snapshotting again). `live` flag makes thing a bit complicated due 
to "maybeUpdate" and "liveUpdate", but AppStatusListener guarantees to flush 
these live entities to KVStore eventually, at least when ElementTrackingStore 
is closed. I guess it would be pretty much helpful if ElementTrackingStore 
provides "flush" to force flush without closing it.
   
   @vanzin 
   As you're expert on this area, could you go through this and provide 
guidance on the approach? 
   
   Feels like we could deal with below steps:
   1. address live entities initialization in 
AppStatusListener/SQLAppStatusListener with non-empty KVStore (based on the 
part of this PR, maybe @Ngone51 would be better to deal with this)
   2. address snapshotting of KVStore
   3. based on 1 and 2, address both issues concurrently, SPARK-28867 
(@Ngone51), SPARK-28594 (me)
   
   The approach would be much better than the origin plan when you prefer 
smaller PR, as SPARK-28594 will be split to 3 sub-issues instead of 2.
   
   Let me know it makes sense to you. Thanks in advance!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to