MaxGekk commented on issue #27834: revert [SPARK-24640][SQL] Return `NULL` from `size(NULL)` by default URL: https://github.com/apache/spark/pull/27834#issuecomment-595751507 > returning -1 looks reasonable as well. For the full picture, -1 may lead to wrong query results. Here is the example I got from @ssimeonov : "A client discovered this behavior while investigating post-click user engagement in their AdTech system. The schema was per ad placement and post-click user engagements were in an array of structs. The culprit was **df.groupBy('placementId).agg(sum(size('engagements)).as("engagement_count"), ...)**, which subtracted 1 for every click without post-click engagement. Luckily, the behavior led to negative engagement counts in some periods, which alerted them to the problem and this bizarre behavior."
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
