MaxGekk commented on issue #27834: revert [SPARK-24640][SQL] Return `NULL` from 
`size(NULL)` by default
URL: https://github.com/apache/spark/pull/27834#issuecomment-595751507
 
 
   > returning -1 looks reasonable as well.
   
   For the full picture,  -1 may lead to wrong query results. Here is the 
example I got from @ssimeonov :
   
   "A client discovered this behavior while investigating post-click user 
engagement in their AdTech system. The schema was per ad placement and 
post-click user engagements were in an array of structs. The culprit was 
**df.groupBy('placementId).agg(sum(size('engagements)).as("engagement_count"), 
...)**, which subtracted 1 for every click without post-click engagement. 
Luckily, the behavior led to negative engagement counts in some periods, which 
alerted them to the problem and this bizarre behavior."
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to