cloud-fan commented on issue #27150: [SPARK-30471][SQL] Fix issue when 
comparing String and IntegerType
URL: https://github.com/apache/spark/pull/27150#issuecomment-589560363
 
 
   Since this has missed 3.0 already, I'm thinking about if we should do a more 
thorough change to fix this problem.
   
   It's a terrible choice to compare string with numeric, as the string content 
is unknown and we need to think about many corner cases.
   
   **case 1: compare string and integer:**
   The string can be a very large number beyond Long.Max, or can be a fraction 
number.
   I think it's better to ansi_cast both sides to long, and fail if the string 
content exceeds Long.Max or is a fraction. This is also the behavior of pgsql
   ```
   cloud0fan=# select '2' > 1;
    ?column? 
   ----------
    t
   (1 row)
   
   cloud0fan=# select '2.2' > 1;
   ERROR:  invalid input syntax for integer: "2.2"
   LINE 1: select '2.2' > 1;
   ```
   
   **case 2: compare string and float/double:**
   Similarly, ansi_cast both sides to double, as it's the widest type.
   
   **case 3: compare string and decimal:**
   decimal is a precise number and precision loss is not acceptable. I think we 
should ansi_cast both sides to `decimal(max_precision, original_scale)`.
   
   More importantly, we should only allow the comparison for literal strings 
like many other SQL systems. The string content is unknown and it's very likely 
to fail, so only allow string literal can fail earlier at compile time.
   
   also cc @maropu @viirya 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to