Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4945#issuecomment-89621194
  
    The thing that makes me hesitant here is whether we should stick to Hive, 
because Hive's behavior is actually error prone and unintuitive. In Hive, `IN` 
is implemented as a UDF, and function argument type coercion rules apply here.
    
    Take `"1.00" IN (1.0, 2.0)` as an example, `"1.00"`, `1.0`, and `2.0` are 
all arguments of `GenericUDFIn`. When doing type coercion, `1.0` and `2.0` is 
first converted to string `"1.0"` and `"2.0"`, and then compared with `"1.00"`, 
thus returns false.
    
    Personally I think maybe we should just throw an exception if the left side 
of `IN` has different data types from the right side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to