aokolnychyi commented on issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints URL: https://github.com/apache/spark/pull/23171#issuecomment-447558507 @mgaido91 1) I think it is the right way to go. There is one point that is not clear to me yet: What happens to `"spark.sql.optimizer.inSetConversionThreshold"`? Do we keep or remove this property? In other words, do we still keep the threshold when `In` is not converted into `InSet`? According to PR #23291, certain data types will become slightly slower if we use `InSet` in all cases. For example, structs/arrays/small decimals are faster with the if-else approach on less than 10 elements. At the same time, if we keep the threshold and move the switch-based logic to `InSet`, then cases with 5-10 elements will not benefit from this optimization. I do not have a strong opinion on this. I tend to think that handling all cases with literals in `InSet` is a clean solution. I just want to know what happens to the config property. @rxin @gatorsmile @dbtsai @cloud-fan @viirya @kiszk what do you think? 2) Yep, there are definitely ways how to speed up longs. We should consider your ideas as well as the idea mentioned by @dbtsai earlier. I think it should be a follow-up PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
