Github user a-roberts commented on the issue:

    https://github.com/apache/spark/pull/15713
  
    In response to @rxin's question, for HiBench CompactBuffers are **used only 
on PageRank** (none of the other 11) and these buffers mainly have between 3 
and 40 elements, no more than 60, never with only two elements. The PageRank 
workload processes 500k pages (large profile), we have 500k CompactBuffer 
constructor calls and 500k prints in the += method when curSize <= 2, 
indicating they're always expanding.
    
    I don't know of any cases where we're adding only a couple of elements, I 
also ran SparkSqlPerf, all 100 queries, again we have no output indicating that 
we use this class (no prints from the constructor, the growToSize or the += 
methods). 
    
    Here's a breakdown of growBySize invocations (prints the curSize variable) 
with PageRank so we have an idea of how big the CompactBuffers actually become.
    
    I used the Spark WordCount example on the 677mb stdout file containing my 
prints to generate this data and we have a total of 18,762,361 growth events.
    
    ```
    (3,500000), (4,500000), (5,500000), (6,500000), (7,500000), (8,500000), 
(9,500000), (10,500000), (11,500000), (12,500000), (13,500000), (14,500000), 
(15,500000), (16,500000), (17,500000), (18,500000), (19,500000), (20,500000), 
(21,499998), (22,499995), (23,499992), (24,499978), (25,499951), (26,499879), 
(27,499729), (28,499321), (29,498517), (30,496984), (31,494114), (32,488878), 
(33,480328), (34,467214), (35,447829), (36,421619), (37,387790), (38,346826), 
(39,300660), (40,251266), (41,201702), (42,155372) (43,114024), (44,79886), 
(45,53196), (46,33580), (47,20146), (48,11569), (49,6222), (50,3143), 
(51,1491), (52,684), (53,289), (54,126), (55,39), (56,15), (57,6), (58,1), 
(59,1), (60,1)
    ```
    On the left we have the CompactBuffer size in elements and on the right we 
have a number representing how many times this appeared in the output file 
(therefore the CompactBuffer has grown to have this many elements that many 
times).
    
    If there are better ways to figure this out or other workloads to suggest 
do let me know, I've got the code ready that replaces CompactBuffer with 
ArrayBuffer(2) for profiling and testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to