GideonPotok commented on PR #46597:
URL: https://github.com/apache/spark/pull/46597#issuecomment-2118994013
What I would really like to try is to move from this implementation to an
approach that will have the collation-support logic moved to the
PartialAggregation stage, by moving logic to `Mode.merge` and `Mode.update`. I
would use a modified open hash map for that with hashing based on the collation
key and with a separate map to map from collation key to one of the actual
values observed that maps to that collation key (which experimentation has
shown could work).
But as it has already been a couple weeks of development on this, I believe
we should, for this PR, confine all the collation logic in the stage that can't
be serialized and deserialized -- the `eval` stage. And I should try what I
have described above in a PR raised after we have merged the approach that has
already been tested (i.e. this PR).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]