Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68433103
  
    For Enums, this patch seems like a strict improvement over the status quo.  
The strengthening of the array checks is the only potentially controversial 
change, but I think it's extremely unlikely to break user programs (it could 
only affect users who tried to use CombineByKey with array keys and a custom 
serializer, which seems like an unlikely use case); besides, any program that 
this breaks was likely giving the wrong answer / results, so it's better to 
fail loudly.
    
    I guess there are still a few cases that could slip through the cracks:
    
    - Java users who use custom serializers
    - Cases where the Java API uses the wrong manifest and can't tell that 
we've passed an array.
    
    I think both of these cases can only be detected with runtime-checks on the 
first record being shuffled.  Maybe we should add those as part of a separate 
PR, though, if we think they're worthwhile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to