Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/3254#issuecomment-63144485
This is great, thanks for looking into this! We haven't done much
profiling of some of these critical code sections yet. I wonder if there
aren't other places where we are being sub-optimal.
In general, I wonder if it isn't a good idea to make sure that in the
critical parts we convert to raw `Array`s that have constant time `length`
functions and lookups (and also avoid function call overhead for both if I
understand correctly).
I've merged to master and 1.2 to make sure this optimization at least makes
the next release.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]