[ https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-21228. --------------------------------- Resolution: Fixed Fix Version/s: 2.3.0 > InSet incorrect handling of structs > ----------------------------------- > > Key: SPARK-21228 > URL: https://issues.apache.org/jira/browse/SPARK-21228 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Bogdan Raducanu > Assignee: Bogdan Raducanu > Fix For: 2.3.0 > > > In InSet it's possible that hset contains GenericInternalRows while child > returns UnsafeRows (and vice versa). InSet uses hset.contains (both in > doCodeGen and eval) which will always be false in this case. > The following code reproduces the problem: > {code} > spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the > default is 10 which requires a longer query text to repro > spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as > a").createOrReplaceTempView("A") > sql("select * from (select min(a) as minA from A) A where minA in > (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', > 2L),named_struct('a', 3L, 'b', 3L))").show // the Aggregate here will return > UnsafeRows while the list of structs that will become hset will be > GenericInternalRows > +----+ > |minA| > +----+ > +----+ > {code} > In.doCodeGen uses compareStructs and seems to work. In.eval might not work > but not sure how to reproduce. > {code} > spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it > will not use InSet > sql("select * from (select min(a) as minA from A) A where minA in > (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', > 2L),named_struct('a', 3L, 'b', 3L))").show > +-----+ > | minA| > +-----+ > |[1,1]| > +-----+ > {code} > Solution could be either to do safe<->unsafe conversion in InSet or not > trigger InSet optimization at all in this case. > Need to investigate if In.eval is affected. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org