Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
@ueshin Thanks for commenting.
It's unfortunate that users find nullability confusing. If you're coming
from a SQL world, you should be quite familiar with nullability and null
values. Nevertheless, Spark has a few issues with nullability, this being one
of them, that I believe need to be addressed. And given the fact that the
Catalyst optimizer considers nullability in several optimization rules makes
this extremely important.
As far as a use case, consider any platform using Spark where a null value
is considered a "real" value, whether valid or invalid, in the given context,
and where data must conform to a particular schema. When Python UDFs are being
used, nullability must be specified correctly in order for conformance to work
correctly.
Also, a PR was recently merged to address this issue on the Scala side.
https://issues.apache.org/jira/browse/SPARK-20668
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]