Github user nongli commented on the pull request:
https://github.com/apache/spark/pull/11141#issuecomment-182141070
@davies Yes I am. I don't think we're going to add a ton more operators and
in all the one you mentioned, we should think hard about serializing to the in
memory structure the operator wants rather than just copying. For example,
CartesianProduct should probably serialize all the rows on one side
contiguously; similarly sort for tungsten pages. In-memory cache should use a
columnar version and not need to first go to UnsafeRow.
I think we can take what I've done here and make it more componentized so
it's less code duplication to use this elsewhere. I'm okay not doing this in
general in the planner. The operators that need to accumulate memory should
think hard about how to do it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]