[ https://issues.apache.org/jira/browse/ARROW-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762523#comment-16762523 ]
Micah Kornfield edited comment on ARROW-4497 at 2/7/19 9:49 AM: ---------------------------------------------------------------- [~pitrou] This is consistent with how java handles things ([https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#equals(java.lang.Object])) so I'm ok with the approach. It seems pandas takes another approach (but I could be mis-using the groupby operator: {{>>> df = pd.DataFrame(}}{{{'a': [-0.0, 0.0, float('nan')]}}}{{)}} {{ >>> df}} {{ a}} {{ 0 -0.0}} {{ 1 0.0}} {{ 2 NaN}} {{ >>> a=df.groupby(df.a)}} {{ >>> a.count(}} {{ ... )}} {{ Empty DataFrame}} {{ Columns: []}} {{ Index: [-0.0]}} was (Author: emkornfi...@gmail.com): [~pitrou] This is consistent with how java handles things (https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#equals(java.lang.Object)) so I'm ok with the approach. It seems pandas takes another approach (but I could be mis-using the groupby operator: {{>>> df = pd.DataFrame({'a': [-0.0, 0.0, float('nan')]}) >>> df a 0 -0.0 1 0.0 2 NaN >>> a=df.groupby(df.a) >>> a.count( ... ) Empty DataFrame Columns: [] Index: [-0.0]}} > [C++] Determine how we want to handle hashing of floating point edge cases > -------------------------------------------------------------------------- > > Key: ARROW-4497 > URL: https://issues.apache.org/jira/browse/ARROW-4497 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Micah Kornfield > Priority: Major > Labels: analytics > Fix For: 0.14.0 > > > We should document expected behavior or implement improvements to hashing > floating point code: > 1. -0.0 and 0.0 (should these be collapsed to 0.0) > 2. NaN (Should we reduce to a single canonical version). -- This message was sent by Atlassian JIRA (v7.6.3#76005)