nchammas commented on PR #53695:
URL: https://github.com/apache/spark/pull/53695#issuecomment-4535124818

   @asugranyes - I am taking another look at these changes and I am confused by 
something. Apparently, array options were part of the work I did earlier, and I 
even added [tests for the same cases you are addressing here][1].
   
   In fact, if you run this on 4.1.2 you will see that the appropriate 
normalization is already happening:
   
   ```sql
   spark-sql (default)> select array_distinct(array(0.0, -0.0, -0.0, 
DOUBLE("NaN"), DOUBLE("NaN")));
   [0.0,NaN]
   Time taken: 0.072 seconds, Fetched 1 row(s)
   ```
   
   Even the more basic normalization I suggested is already in place from my 
prior work:
   
   ```sql
   spark-sql (default)> select -0, -0.0;
   0    0.0
   Time taken: 1.19 seconds, Fetched 1 row(s)
   ```
   
   So my question is, how come the example in your PR description doesn't 
exercise the same code path that already normalizes -0.0 appropriately? Seems 
like it has something to do with using DataFrames vs. SQL.
   
   [1]: 
https://github.com/apache/spark/pull/45036/changes#diff-78dcad2887766aae19456b004ceb6f6b47806611885f3ee6c2ba46668f05d17b


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to