[ https://issues.apache.org/jira/browse/SPARK-26366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-26366. ----------------------------- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 > Except with transform regression > -------------------------------- > > Key: SPARK-26366 > URL: https://issues.apache.org/jira/browse/SPARK-26366 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.3.2 > Reporter: Dan Osipov > Assignee: Marco Gaido > Priority: Major > Fix For: 2.4.1, 3.0.0 > > > There appears to be a regression between Spark 2.2 and 2.3. Below is the code > to reproduce it: > > {code:java} > import org.apache.spark.sql.functions.col > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > val inputDF = spark.sqlContext.createDataFrame( > spark.sparkContext.parallelize(Seq( > Row("0", "john", "smith", "j...@smith.com"), > Row("1", "jane", "doe", "j...@doe.com"), > Row("2", "apache", "spark", "sp...@apache.org"), > Row("3", "foo", "bar", null) > )), > StructType(List( > StructField("id", StringType, nullable=true), > StructField("first_name", StringType, nullable=true), > StructField("last_name", StringType, nullable=true), > StructField("email", StringType, nullable=true) > )) > ) > val exceptDF = inputDF.transform( toProcessDF => > toProcessDF.filter( > ( > col("first_name").isin(Seq("john", "jane"): _*) > and col("last_name").isin(Seq("smith", "doe"): _*) > ) > or col("email").isin(List(): _*) > ) > ) > inputDF.except(exceptDF).show() > {code} > Output with Spark 2.2: > {noformat} > +---+----------+---------+----------------+ > | id|first_name|last_name| email| > +---+----------+---------+----------------+ > | 2| apache| spark|sp...@apache.org| > | 3| foo| bar| null| > +---+----------+---------+----------------+{noformat} > Output with Spark 2.3: > {noformat} > +---+----------+---------+----------------+ > | id|first_name|last_name| email| > +---+----------+---------+----------------+ > | 2| apache| spark|sp...@apache.org| > +---+----------+---------+----------------+{noformat} > Note, changing the last line to > {code:java} > inputDF.except(exceptDF.cache()).show() > {code} > produces identical output for both Spark 2.3 and 2.2 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org