GideonPotok commented on PR #46597:
URL: https://github.com/apache/spark/pull/46597#issuecomment-2118994013

    What I would really like to try is to move from this implementation to an 
approach that will have the collation-support logic moved to the 
PartialAggregation stage, by moving logic to `Mode.merge` and `Mode.update`. I 
would use a modified open hash map for that with hashing based on the collation 
key and with a separate map to map from collation key to one of the actual 
values observed that maps to that collation key (which experimentation has 
shown could work).
   
   But as it has already been a couple weeks of development on this, I believe 
we should, for this PR, confine all the collation logic in the stage that can't 
be serialized and deserialized -- the `eval` stage. And I should try what I 
have described above in a PR raised after we have merged the approach that has 
already been tested (i.e. this PR).
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to