Re: [PR] [SPARK-48728][SQL] Support ignoreNulls for collect_list and collect_set [spark]

via GitHub Sun, 15 Dec 2024 23:10:16 -0800


eejbyfeldt commented on PR #47149:
URL: https://github.com/apache/spark/pull/47149#issuecomment-2544769840


   > @eejbyfeldt yea this is worth doing. Do we just rewrite it to 
`collect_list(...) FILTER (WHERE input IS NOT NULL)`?
   
   No. `IGNORE NULLS` is actually the default behavior for the collect 
aggregates, so this is PR is actually adding support for `RESPECT NULLS` by 
supporting setting `ignoreNulls` to false. 
   
   But it sounds reasonable to rewrite it to an aggregate with a filter to 
avoid needing this logic in each aggregate. But It not clear to me where that 
rewrite should happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48728][SQL] Support ignoreNulls for collect_list and collect_set [spark]

Reply via email to