[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-658451540 Thanks! Merged into master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-658208340 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-657921314 @tgravescs Do you plan another round of review, or OK as it is? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-656593776 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-656388861 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-656158596 Yeah I’m OK either way. I think both ways wouldn’t bring issues in reality. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-655213773 Thanks for the update. There's a case it goes up closer to 10x but not really 10x, which seems that 10x is safe one to apply. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-655125113 @baohe-zhang Thanks for the update. This is really helpful. So the small event log file shows there's a chance the ratio can be beyond 1/2. Personally it seems OK to simply allow such case, as these small event logs can be migrated to the LevelDB KV store fast enough, and the memory usage would hold shortly. That said, it'd be simpler if we can just go through with LevelDB KV store for small event logs, no need to run background thread per app log. Either is fine for me. For latter the lower bound should be configured against uncompressed log, and we should do the similar estimation on compressed log before applying. That's another tricky one, but probably good to have. (I wouldn't run background thread for event log which can be loaded in a couple of seconds) @tgravescs WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-654866771 That said, probably we also haven't check how much hybrid KV store will help on the smaller app event log (like 100mb or even smaller in uncompressed). If that turns out to not help enough (either hybrid KV store only reduces a couple of seconds or replaying is fast enough without hybrid KV store), we can even exclude them on hybrid KV store path via having a lower bound (considering compression), as I assume the overhead of hybrid KV store is not that tiny. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-654853105 So if I understand correctly, what we want to confirm is that the (size+compression)-memory ratio goes linearly on the number of tasks or not. Like short running interactive query (small number of tasks in overall) vs long running batch query (vs long running streaming query). I guess latter has been addressed in recent experiments, but former hasn't been experimented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-654561269 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-653246229 Let's try to overestimate for memory usage, as it's more critical than estimating disk usage (even for disk usage I think overestimating a bit is safer) and might lead to OOME if end users configure too tight on remaining heap memory. zstd deserves to get at least 7x in the latest experiment, right? IMHO probably safer to apply 10x, and also apply 4x for other compression codecs as well. Let's hear @tgravescs voice on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641803852 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641693298 cc. @vanzin @squito Please take a look at this PR - looks like this is a notable improvement on reducing load latency in SHS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-641028299 @baohe-zhang Just FYI, probably you may want to mention me - looks like you've been mentioned other folk. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-638538734 Oh OK there's an exception... My bad. Thanks for noticing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-636843886 I'm not missing this PR, but Spark 3.0.0 must be the highest priority as well as this change is non-trivial (800+ lines dealing with multi-threads) I'm not sure I can take a deep look at this soon. Btw, I haven't changed my voice - personally I still feel it a bit complicated, as I commented earlier on https://github.com/apache/spark/pull/28412#pullrequestreview-405371683. I don't think hybrid KV Store will be used on general purpose, as its purpose is to help to replay the events faster. Could we leverage the fact to simplify the case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-636833377 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-624964828 The idea is similar with HistoryServerDiskManager so makes sense in general. We may need to get concrete answers for these questions to go forward: 1. How we will guarantee these area of memory is used only for Hybrid KV store to prevent OOM? (Or no guard and end users have to deal with providing enough memory on heap?) 2. How to calculate approximate memory usage? I guess it would be safe to assume the approximate size as event log file size, but it would take over huge memory for single app. (That may not be a problem on safety perspective, but pretty less efficient.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-624441325 I'm not sure that's fairly simple to do. Concurrent load of applications can be happening in SHS, right? The default value of `spark.history.retainedApplications` is 50, which means maximum 50 apps can be loaded into cache at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-624434103 Let's discuss first with the plan how to address the major concerns in comments, especially how to restrict the overall memory usage. I think that's a blocker for the production use. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
HeartSaVioR commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-624415280 Also worth to mention that memory usage should be under control; there's no restriction for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org