Ngone51 commented on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore checkpoint to speed up replay log file in HistoryServer URL: https://github.com/apache/spark/pull/25577#issuecomment-532737445 > For snapshotting in driver, I also considered both sides (driver/SHS) in design phase but took SHS because listener is the thing which must not be blocked by long operation. Well, in SPARK-28867, I don't think there's a choice of creating snapshot on driver or SHS. As I mentioned above, there can be a case where SHS not running before an application finished. So, I think it can only create snapshot on driver. > Either you'll end up with cloning InMemoryStore and snapshotting with cloned one (requires more memory on driver, but unless deep copy is required, then maybe applicable), or block listener to process next event until snapshot has been taken. You can't take snapshot asynchronously with live AppStatusListener. Yes, you're right. And in this PR, we don't have a copy for InMemoryStore and we just block(but with optimizations) during snapshotting. I also realized that listener couldn't take too much time while processing events. So, in this PR, we use a separate thread to do snapshot and we don't block for frequent events(e.g. `SparkListenerTaskStart`, `SparkListenerTaskEnd`). Snapshotting is more likely to be triggered within frequent incoming events, and this would make blocking less happening. > ideally you're ensuring your design is already reviewed internally in your team, but that's optional. Actually, I've reached an agreement with @gengliangwang before I made this PR. > I can see the major benefits from taking snapshot from driver, so if you can prove it doesn't bring considerable latency in listener side and no memory pressure, it would be better approach. As you know, this's still a work in process and I only wrote a prototype yet in order to validate the thoughts. And there still be some work to be done before this PR gets perfectly work. The one is that correctly handling different behavior between AppStatusListener and EventLoggingListener, which I mentioned above. Maybe, we could resolve it with more careful control at code level, but I'm afraid the whole idea is not good enough and codes will result in buggy finally. All in all, this still couldn't be a competed implementation yet. So, for me, it still far from testing of latency and memory.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
