[ https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-47927. --------------------------------- Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46156 [https://github.com/apache/spark/pull/46156] > Nullability after join not respected in UDF > ------------------------------------------- > > Key: SPARK-47927 > URL: https://issues.apache.org/jira/browse/SPARK-47927 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 4.0.0, 3.5.1, 3.4.3 > Reporter: Emil Ejbyfeldt > Assignee: Emil Ejbyfeldt > Priority: Major > Labels: correctness, pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > > {code:java} > val ds1 = Seq(1).toDS() > val ds2 = Seq[Int]().toDS() > val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(f(struct(ds1("value"), ds2("value")))).show() > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(struct(ds1("value"), ds2("value"))).show() {code} > outputs > {code:java} > +---------------------------------------+ > |UDF(struct(value, value, value, value))| > +---------------------------------------+ > | {1, 0}| > +---------------------------------------+ > +--------------------+ > |struct(value, value)| > +--------------------+ > | {1, NULL}| > +--------------------+ {code} > So when the result is passed to UDF the null-ability after the the join is > not respected and we incorrectly end up with a 0 value instead of a null/None > value. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org