WangGuangxin commented on issue #23942: [SPARK-27033][SQL]Add Optimize rule RewriteArithmeticFiltersOnIntegralColumn URL: https://github.com/apache/spark/pull/23942#issuecomment-471254770 > How do you handle this behaviour change? > > ``` > // v2.4.0 > scala> Seq(0, Int.MaxValue).toDF("v").write.saveAsTable("t") > scala> sql("select * from t").show > +----------+ > | v| > +----------+ > | 0| > |2147483647| > +----------+ > > scala> sql("select * from t where v + 1 > 0").show > +---+ > | v| > +---+ > | 0| > +---+ > > // this pr > scala> sql("select * from t where v + 1 > 0").show > +----------+ > | v| > +----------+ > | 0| > |2147483647| > +----------+ > ``` This is a bad case I didn't think about it before. I found there are four kinds of cases. - ` v + 1 > 0` => `v > -1 and v <= Int.MAX - 1` - `v - 1 > 0` => `v > 1 or (v < Int.MIN + 1 && v > 0 - 1 + Int.MIN - Int.MAX )` - `v + 1 < 0` => `v < -1 or (v > Int.MAX -1 && v < 0 - 1 + Int.MAX - Int.MIN)` - `v - 1 < 0` => `v < 1 and v >= Int.MIN + 1` For one inequality, after rewrite, there may need two or three inequalities, which makes expressions much more complex. So I think it doesn't worth to convert inequality. We may only handle `= or !=` here. What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
