Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/5376#issuecomment-90379320
  
    The performance impact is probably negligible in most of these cases, since 
most of these methods are invoked only once or twice on the driver, but using 
`length` instead of `size` could have a large impact for code in hot loops that 
get called `O(numRecords)` times.  Unless I'm overlooking something, though, it 
doesn't look like any of the instances here are occurring in 
performance-sensitive code.
    
    Did you check to see whether there are any occurrences of `.size` outside 
of core, perhaps in SQL or MLlib, where the performance benefit might be 
greater?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to