JoshRosen commented on issue #24557: [SPARK-27653][SQL] Add max_by() SQL 
aggregate function
URL: https://github.com/apache/spark/pull/24557#issuecomment-490549494
 
 
   Hi @viirya,
   
   Thanks for working on this! 
   
   I had a few quick questions:
   
   - Could you also implement `min_by(x, y)`?
     - It looks like you might be able to share most of the code except for 
replacing `GreaterThan` and `greatest`, so maybe this difference could be 
abstracted away via a shared abstract superclass.
   - Presto also has three-argument versions of `max_by` / `min_by`:
   
     > max_by(x, y, n) → array<[same as x]>
     > Returns n values of x associated with the n largest of all input values 
of y in descending order of y.
   
     I don't think we need to do this version now, especially since we can 
always add it in a separate followup PR (which is what Presto originally did: 
https://github.com/prestodb/presto/issues/3620)
   - Were there any bugs in older implementations of Presto version that we 
might have replicated here? Or Presto tests for edge-cases that we could 
emulate?
     - https://github.com/prestodb/presto/issues/7646 discusses using rows / 
structs as the ordering value. I suspect that would work already in your 
implementation, but it might be nice to check.
     - https://github.com/prestodb/presto/issues/2040 discusses a problem with 
null ordering values. It looks like you're testing that case, but I just wanted 
to double-check that we've covered the same edge-case there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to