Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5376#issuecomment-90379320
The performance impact is probably negligible in most of these cases, since
most of these methods are invoked only once or twice on the driver, but using
`length` instead of `size` could have a large impact for code in hot loops that
get called `O(numRecords)` times. Unless I'm overlooking something, though, it
doesn't look like any of the instances here are occurring in
performance-sensitive code.
Did you check to see whether there are any occurrences of `.size` outside
of core, perhaps in SQL or MLlib, where the performance benefit might be
greater?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]