[jira] [Comment Edited] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison
[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394627#comment-16394627 ] Alex Hagerman edited comment on ARROW-640 at 3/11/18 9:02 PM: -- I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. I'm going to look at the history of __eq__ on ArrayValue and as_py then work on what would make sense for __hash__. {code:java} %load_ext Cython import pyarrow as pa pylist = [1,1,1,2] arr = pa.array(pylist) arr [ 1, 1, 1, 2 ] arr[0] == arr[1] True arr[0] == arr[3] False word_list = ['test', 'not the same', 'test', 'nope'] word_list[0] == word_list[2] True word_list[0] == word_list[1] False pa.array.__eq__ set(arr) --- TypeError Traceback (most recent call last) in () > 1 set(arr) TypeError: unhashable type: 'pyarrow.lib.Int64Value' arr_list = pa.from_pylist([1, 1, 1, 2]) --- AttributeErrorTraceback (most recent call last) in () > 1 arr_list = pa.from_pylist([1, 1, 1, 2]) AttributeError: module 'pyarrow' has no attribute 'from_pylist' {code} was (Author: alexhagerman): I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. I'm going to look at the history or __eq__ on the ScalarValue and as_py then work on what would make sense for __hash__. {code:java} %load_ext Cython import pyarrow as pa pylist = [1,1,1,2] arr = pa.array(pylist) arr [ 1, 1, 1, 2 ] arr[0] == arr[1] True arr[0] == arr[3] False word_list = ['test', 'not the same', 'test', 'nope'] word_list[0] == word_list[2] True word_list[0] == word_list[1] False pa.array.__eq__ set(arr) --- TypeError Traceback (most recent call last) in () > 1 set(arr) TypeError: unhashable type: 'pyarrow.lib.Int64Value' arr_list = pa.from_pylist([1, 1, 1, 2]) --- AttributeErrorTraceback (most recent call last) in () > 1 arr_list = pa.from_pylist([1, 1, 1, 2]) AttributeError: module 'pyarrow' has no attribute 'from_pylist' {code} > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Miki Tebeka >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison
[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394627#comment-16394627 ] Alex Hagerman edited comment on ARROW-640 at 3/11/18 9:01 PM: -- I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. I'm going to look at the history or __eq__ on the ScalarValue and as_py then work on what would make sense for __hash__. {code:java} %load_ext Cython import pyarrow as pa pylist = [1,1,1,2] arr = pa.array(pylist) arr [ 1, 1, 1, 2 ] arr[0] == arr[1] True arr[0] == arr[3] False word_list = ['test', 'not the same', 'test', 'nope'] word_list[0] == word_list[2] True word_list[0] == word_list[1] False pa.array.__eq__ set(arr) --- TypeError Traceback (most recent call last) in () > 1 set(arr) TypeError: unhashable type: 'pyarrow.lib.Int64Value' arr_list = pa.from_pylist([1, 1, 1, 2]) --- AttributeErrorTraceback (most recent call last) in () > 1 arr_list = pa.from_pylist([1, 1, 1, 2]) AttributeError: module 'pyarrow' has no attribute 'from_pylist' {code} was (Author: alexhagerman): I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. {code:java} %load_ext Cython import pyarrow as pa pylist = [1,1,1,2] arr = pa.array(pylist) arr [ 1, 1, 1, 2 ] arr[0] == arr[1] True arr[0] == arr[3] False word_list = ['test', 'not the same', 'test', 'nope'] word_list[0] == word_list[2] True word_list[0] == word_list[1] False pa.array.__eq__ set(arr) --- TypeError Traceback (most recent call last) in () > 1 set(arr) TypeError: unhashable type: 'pyarrow.lib.Int64Value' arr_list = pa.from_pylist([1, 1, 1, 2]) --- AttributeErrorTraceback (most recent call last) in () > 1 arr_list = pa.from_pylist([1, 1, 1, 2]) AttributeError: module 'pyarrow' has no attribute 'from_pylist' {code} > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Miki Tebeka >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison
[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394627#comment-16394627 ] Alex Hagerman edited comment on ARROW-640 at 3/11/18 8:16 PM: -- I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. {code:java} %load_ext Cython import pyarrow as pa pylist = [1,1,1,2] arr = pa.array(pylist) arr [ 1, 1, 1, 2 ] arr[0] == arr[1] True arr[0] == arr[3] False word_list = ['test', 'not the same', 'test', 'nope'] word_list[0] == word_list[2] True word_list[0] == word_list[1] False pa.array.__eq__ set(arr) --- TypeError Traceback (most recent call last) in () > 1 set(arr) TypeError: unhashable type: 'pyarrow.lib.Int64Value' arr_list = pa.from_pylist([1, 1, 1, 2]) --- AttributeErrorTraceback (most recent call last) in () > 1 arr_list = pa.from_pylist([1, 1, 1, 2]) AttributeError: module 'pyarrow' has no attribute 'from_pylist' {code} was (Author: alexhagerman): I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Miki Tebeka >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison
[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394627#comment-16394627 ] Alex Hagerman edited comment on ARROW-640 at 3/11/18 8:13 PM: -- I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. was (Author: alexhagerman): I think this has changed since the original ticket. The comparison appears to be working. Tested this with string and numbers. Also getting an error on set now. Going to continue looking into this, but if anybody has thoughts on this I'd be happy to hear them. Also from_pylist appears to have been removed, but I didn't find it searching the change log on github only an addition in 0.3. ```python %load_ext Cython ``` ```python import pyarrow as pa pylist = [1,1,1,2] arr = pa.array(pylist) arr ``` [ 1, 1, 1, 2 ] ```python arr[0] == arr[1] ``` True ```python set(arr) ``` --- TypeError Traceback (most recent call last) in () > 1 set(arr) TypeError: unhashable type: 'pyarrow.lib.Int64Value' ```python arr_list = pa.from_pylist([1, 1, 1, 2]) ``` --- AttributeError Traceback (most recent call last) in () > 1 arr_list = pa.from_pylist([1, 1, 1, 2]) AttributeError: module 'pyarrow' has no attribute 'from_pylist' > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Miki Tebeka >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)