[GitHub] [spark] vanzin commented on issue #24982: [SPARK-28181][CORE] Add a filter interface to KVStore to speed up the entities retrieve

GitBox Fri, 28 Jun 2019 08:47:50 -0700

vanzin commented on issue #24982: [SPARK-28181][CORE] Add a filter interface to 
KVStore to speed up the entities retrieve
URL: https://github.com/apache/spark/pull/24982#issuecomment-506782293
 
 
   I haven't read the code (just some of the comments), but I wonder why you're 
using this approach to implement SPARK-28183.
   
   With this approach you have to load (i.e. deserialize in the case of disk 
store) all tasks for a particular stage to filter them. While I think the API 
itself you're adding here is ok (it's basically what `KVUtils.viewToSeq` does 
and could replace it), it will be terribly slow for large stages (think a stage 
with 100k tasks).
   
   SPARK-28183 would be way more efficient if you instead scanned the tasks 
based on the status you want, applying the offset and limit, and sorted based 
on a different property after that (because of offset and limit, you wouldn't 
have a lot of elements to sort).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] vanzin commented on issue #24982: [SPARK-28181][CORE] Add a filter interface to KVStore to speed up the entities retrieve

Reply via email to