[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

ptkool Mon, 21 Aug 2017 04:59:11 -0700

Github user ptkool commented on the issue:

    https://github.com/apache/spark/pull/18906
  
    @ueshin Thanks for commenting.
    
    It's unfortunate that users find nullability confusing. If you're coming 
from a SQL world, you should be quite familiar with nullability and null 
values. Nevertheless, Spark has a few issues with nullability, this being one 
of them, that I believe need to be addressed. And given the fact that the 
Catalyst optimizer considers nullability in several optimization rules makes 
this extremely important.
    
    As far as a use case, consider any platform using Spark where a null value 
is considered a "real" value, whether valid or invalid, in the given context, 
and where data must conform to a particular schema. When Python UDFs are being 
used, nullability must be specified correctly in order for conformance to work 
correctly.
    
    Also, a PR was recently merged to address this issue on the Scala side.
    https://issues.apache.org/jira/browse/SPARK-20668



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

Reply via email to