aokolnychyi commented on issue #23171: [SPARK-26205][SQL] Optimize In for 
bytes, shorts, ints
URL: https://github.com/apache/spark/pull/23171#issuecomment-447558507
 
 
   @mgaido91 
   
   1) I think it is the right way to go.
   
   There is one point that is not clear to me yet:
   
   What happens to `"spark.sql.optimizer.inSetConversionThreshold"`? Do we keep 
or remove this property? In other words, do we still keep the threshold when 
`In` is not converted into `InSet`? 
   
   According to PR #23291, certain data types will become slightly slower if we 
use `InSet` in all cases. For example, structs/arrays/small decimals are faster 
with the if-else approach on less than 10 elements.
   
   At the same time, if we keep the threshold and move the switch-based logic 
to `InSet`, then cases with 5-10 elements will not benefit from this 
optimization.
   
   I do not have a strong opinion on this. I tend to think that handling all 
cases with literals in `InSet` is a clean solution. I just want to know what 
happens to the config property.
   
   @rxin @gatorsmile @dbtsai @cloud-fan @viirya @kiszk what do you think?
   
   2) Yep, there are definitely ways how to speed up longs. We should consider 
your ideas as well as the idea mentioned by @dbtsai earlier. I think it should 
be a follow-up PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to