Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17226
  
    I see. So, `count` in "**Spark 2.1.0** (and presumably 2.0.x/master)" was 
unexpectedly introduced by the optimization in SPARK-13749 and this behaviour 
change between 1.6 and master (whether it is right or not) is found now 
together.
    
    So.. if I understood correctly, several problems are mixed and found here:
    
     1. counting `null` problem (for both optimized and non-optimized)
    
     2. NPE (for optimized)
    
     3. `0` vs `null` for missing values in `count`
    
    and this PR tries to fix both 1. and 2. 
    whereas mine tries to fix 1. and both 2. and few specific cases in 3. (by 
avoiding optimization as an workaround).
    
    Okay, I am fine with closing mine (honestly, the initial version in my PR 
was almost identical with this PR..). 
    
    Thanks for elaborating it and bearing with me.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to