mridulm commented on PR #38567: URL: https://github.com/apache/spark/pull/38567#issuecomment-1308113372
To comment on proposal in description, based on past prototypes I have worked on/seen: Maintaining state at driver on disk backed store and copying that to dfs has a few things which impact it - particularly for larger applications. They are not very robust to application crashes, interact in nontrivial ways with shutdown hook (hdfs failures) and increase application termination time during graceful shutdown. Depending on application characteristics, the impact of disk backed store can positively or negatively impact driver performance (positively - as updates are faster due to index, which was lacking in in memory store (when I added index, memory requirements increased :-( ), negatively due to increased disk activity): was difficult to predict. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
