[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-14 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-658451540


   Thanks! Merged into master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-14 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-658208340


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-13 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-657921314


   @tgravescs Do you plan another round of review, or OK as it is?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-10 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-656593776


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-09 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-656388861


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-09 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-656158596


   Yeah I’m OK either way. I think both ways wouldn’t bring issues in reality.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-07 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-655213773


   Thanks for the update. There's a case it goes up closer to 10x but not 
really 10x, which seems that 10x is safe one to apply.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-07 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-655125113


   @baohe-zhang 
   Thanks for the update. This is really helpful. So the small event log file 
shows there's a chance the ratio can be beyond 1/2.
   
   Personally it seems OK to simply allow such case, as these small event logs 
can be migrated to the LevelDB KV store fast enough, and the memory usage would 
hold shortly. That said, it'd be simpler if we can just go through with LevelDB 
KV store for small event logs, no need to run background thread per app log.
   
   Either is fine for me. For latter the lower bound should be configured 
against uncompressed log, and we should do the similar estimation on compressed 
log before applying. That's another tricky one, but probably good to have. (I 
wouldn't run background thread for event log which can be loaded in a couple of 
seconds)
   
   @tgravescs WDYT?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-07 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-654866771


   That said, probably we also haven't check how much hybrid KV store will help 
on the smaller app event log (like 100mb or even smaller in uncompressed). If 
that turns out to not help enough (either hybrid KV store only reduces a couple 
of seconds or replaying is fast enough without hybrid KV store), we can even 
exclude them on hybrid KV store path via having a lower bound (considering 
compression), as I assume the overhead of hybrid KV store is not that tiny.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-07 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-654853105


   So if I understand correctly, what we want to confirm is that the 
(size+compression)-memory ratio goes linearly on the number of tasks or not. 
Like short running interactive query (small number of tasks in overall) vs long 
running batch query (vs long running streaming query). I guess latter has been 
addressed in recent experiments, but former hasn't been experimented.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-06 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-654561269


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-07-02 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-653246229


   Let's try to overestimate for memory usage, as it's more critical than 
estimating disk usage (even for disk usage I think overestimating a bit is 
safer) and might lead to OOME if end users configure too tight on remaining 
heap memory.
   
   zstd deserves to get at least 7x in the latest experiment, right? IMHO 
probably safer to apply 10x, and also apply 4x for other compression codecs as 
well. Let's hear @tgravescs voice on this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-10 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641803852


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641693298


   cc. @vanzin @squito
   Please take a look at this PR - looks like this is a notable improvement on 
reducing load latency in SHS.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-09 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-641028299


   @baohe-zhang Just FYI, probably you may want to mention me - looks like 
you've been mentioned other folk.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-03 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-638538734


   Oh OK there's an exception... My bad. Thanks for noticing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-01 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-636843886


   I'm not missing this PR, but Spark 3.0.0 must be the highest priority as 
well as this change is non-trivial (800+ lines dealing with multi-threads) I'm 
not sure I can take a deep look at this soon.
   
   Btw, I haven't changed my voice - personally I still feel it a bit 
complicated, as I commented earlier on 
https://github.com/apache/spark/pull/28412#pullrequestreview-405371683. I don't 
think hybrid KV Store will be used on general purpose, as its purpose is to 
help to replay the events faster. Could we leverage the fact to simplify the 
case?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-06-01 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-636833377


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-06 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624964828


   The idea is similar with HistoryServerDiskManager so makes sense in general. 
We may need to get concrete answers for these questions to go forward:
   
   1. How we will guarantee these area of memory is used only for Hybrid KV 
store to prevent OOM? (Or no guard and end users have to deal with providing 
enough memory on heap?)
   
   2. How to calculate approximate memory usage? I guess it would be safe to 
assume the approximate size as event log file size, but it would take over huge 
memory for single app. (That may not be a problem on safety perspective, but 
pretty less efficient.)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624441325


   I'm not sure that's fairly simple to do. Concurrent load of applications can 
be happening in SHS, right? The default value of 
`spark.history.retainedApplications` is 50, which means maximum 50 apps can be 
loaded into cache at the same time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624434103


   Let's discuss first with the plan how to address the major concerns in 
comments, especially how to restrict the overall memory usage. I think that's a 
blocker for the production use.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624415280


   Also worth to mention that memory usage should be under control; there's no 
restriction for now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org