[jira] [Comment Edited] (ARROW-4497) [C++] Determine how we want to handle hashing of floating point edge cases

Micah Kornfield (JIRA) Thu, 07 Feb 2019 01:58:51 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762523#comment-16762523
 ]


Micah Kornfield edited comment on ARROW-4497 at 2/7/19 9:49 AM:
----------------------------------------------------------------

[~pitrou] This is consistent with how java handles things 
([https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#equals(java.lang.Object]))
 so I'm ok with the approach. It seems pandas takes another approach (but I 
could be mis-using the groupby operator:

{{>>> df = pd.DataFrame(}}{{{'a': [-0.0, 0.0, float('nan')]}}}{{)}}
{{ >>> df}}
{{ a}}
{{ 0 -0.0}}
{{ 1 0.0}}
{{ 2 NaN}}
{{ >>> a=df.groupby(df.a)}}
{{ >>> a.count(}}
{{ ... )}}
{{ Empty DataFrame}}
{{ Columns: []}}
{{ Index: [-0.0]}}


was (Author: emkornfi...@gmail.com):
[~pitrou] This is consistent with how java handles things 
(https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#equals(java.lang.Object))
 so I'm ok with the approach.   It seems pandas takes another approach (but I 
could be mis-using the groupby operator:

{{>>> df = pd.DataFrame({'a': [-0.0, 0.0, float('nan')]})
>>> df
     a
0 -0.0
1  0.0
2  NaN
>>> a=df.groupby(df.a)
>>> a.count(
... )
Empty DataFrame
Columns: []
Index: [-0.0]}}

> [C++] Determine how we want to handle hashing of floating point edge cases
> --------------------------------------------------------------------------
>
>                 Key: ARROW-4497
>                 URL: https://issues.apache.org/jira/browse/ARROW-4497
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Micah Kornfield
>            Priority: Major
>              Labels: analytics
>             Fix For: 0.14.0
>
>
> We should document expected behavior or implement improvements to hashing 
> floating point code:
> 1.  -0.0 and 0.0 (should these be collapsed to 0.0)
> 2. NaN (Should we reduce to a single canonical version).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (ARROW-4497) [C++] Determine how we want to handle hashing of floating point edge cases

Reply via email to