Github user a-roberts commented on the issue:

    https://github.com/apache/spark/pull/15713
  
    Thanks for the prompt feedback, we found this opportunity when profiling 
Spark 1.6.2 with HiBench large and again this showed up as a hot method with 
the PageRank benchmark, we can gather data to see if it's still hot with Spark 
2 also and I'm planning to contribute lots of similar improvements
    
    Paraphrasing from a colleague:
    
    > This data structure is the backing data structure used by RDDs that are 
doing group by operations (we saw it from a PairRDD doing a groupByKey in 
PageRank)
    > 
    > The downside of the existing implementation is that every method in this 
class has an if ... else ... if ... else ... which handles element 0, element 1 
and then everything else respectively
    > 
    > We found that on PageRank this change provides a throughput boost of 
around 5% and costs us about 1 MB of estimated RDD size (86.5 MB to just under 
88 MB)_
    
    Note that with our testing using OpenJDK 8 we didn't see a noticeable 
performance improvement (nor a regression) despite the very slight footprint 
increase (an increase of 2 MB instead of 1.5 MB), ideally we'll improve the 
performance for everybody so there may be scope for optimisations here that'll 
be of use to OpenJDK users too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to