[jira] [Updated] (ARROW-2538) [Java] Introduce BaseWriter.writeNull method
[ https://issues.apache.org/jira/browse/ARROW-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated ARROW-2538: -- Description: When a data set has null values in complex data type, it's hard to call proper writeNull method. Because writeNull is declared in AbstractFieldWriter, which is package local, not public. Moreover, UnionListWriter has no implementation of writeNull method, so it falls into AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by default. So BaseWriter.writeNull is required to write null values inside of complex data types. (was: When a data set has null values in complex data type, it's hard to call proper writeNull method. Because writeNull is declared in AbstractFieldWriter, which is package local, not public. Moreover, UnionListWriter has no implementation of writeNull method, so it falls into AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by default. So FieldWriter.writeNull is required to write null values inside of complex data types.) Summary: [Java] Introduce BaseWriter.writeNull method (was: [Java] Introduce FieldWriter.writeNull method) > [Java] Introduce BaseWriter.writeNull method > > > Key: ARROW-2538 > URL: https://issues.apache.org/jira/browse/ARROW-2538 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > When a data set has null values in complex data type, it's hard to call > proper writeNull method. Because writeNull is declared in > AbstractFieldWriter, which is package local, not public. Moreover, > UnionListWriter has no implementation of writeNull method, so it falls into > AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by > default. So BaseWriter.writeNull is required to write null values inside of > complex data types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2538) [Java] Introduce FieldWriter.writeNull method
Teddy Choi created ARROW-2538: - Summary: [Java] Introduce FieldWriter.writeNull method Key: ARROW-2538 URL: https://issues.apache.org/jira/browse/ARROW-2538 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Teddy Choi Assignee: Teddy Choi When a data set has null values in complex data type, it's hard to call proper writeNull method. Because writeNull is declared in AbstractFieldWriter, which is package local, not public. Moreover, UnionListWriter has no implementation of writeNull method, so it falls into AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by default. So FieldWriter.writeNull is required to write null values inside of complex data types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2278) [Python] deserializing Numpy struct arrays raises
[ https://issues.apache.org/jira/browse/ARROW-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Licht Takeuchi reassigned ARROW-2278: - Assignee: Licht Takeuchi > [Python] deserializing Numpy struct arrays raises > - > > Key: ARROW-2278 > URL: https://issues.apache.org/jira/browse/ARROW-2278 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Antoine Pitrou >Assignee: Licht Takeuchi >Priority: Major > > {code:python} > >>> import numpy as np > >>> dt = np.dtype([('x', np.int8), ('y', np.float32)]) > >>> arr = np.arange(5*10, dtype=np.int8).view(dt) > >>> pa.deserialize(pa.serialize(arr).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > pa.deserialize(pa.serialize(arr).to_buffer()) > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File "/home/antoine/arrow/python/pyarrow/serialization.py", line 44, in > _deserialize_numpy_array_list > return np.array(data[0], dtype=np.dtype(data[1])) > TypeError: a bytes-like object is required, not 'int' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463370#comment-16463370 ] Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 5:09 AM: --- SparseDataFrame is planned to be deprecated in pandas. [https://github.com/pandas-dev/pandas/issues/19239] was (Author: licht-t): SparseDataFrame is planned to be deprecated. [https://github.com/pandas-dev/pandas/issues/19239] > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463370#comment-16463370 ] Licht Takeuchi commented on ARROW-2273: --- SparseDataFrame is planned to be deprecated. [https://github.com/pandas-dev/pandas/issues/19239] > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1661) [Python] Compile and test with Python 3.7
[ https://issues.apache.org/jira/browse/ARROW-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462745#comment-16462745 ] Antoine Pitrou commented on ARROW-1661: --- I ran into linking issues while trying to compile Arrow against a self-compiled Python 3.7: {code} [3/3] Linking CXX executable debug/python-test FAILED: : && /usr/bin/ccache /usr/bin/c++ -ggdb -O0 -Wall -std=c++11 -msse3 -g src/arrow/python/CMakeFiles/python-test.dir/python-test.cc.o -o debug/python-test -rdynamic debug/libarrow_python_test_main.a debug/libarrow_python.so.10.0.0 debug/libarrow.so.10.0.0 /home/antoine/cpython/37/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/libpython3.7m.a googletest_ep-prefix/src/googletest_ep/lib/libgtest.a -lpthread -ldl zstd_ep-prefix/src/zstd_ep/lib/libzstd.a zlib_ep/src/zlib_ep-install/lib/libz.a snappy_ep/src/snappy_ep-install/lib/libsnappy.a lz4_ep-prefix/src/lz4_ep/lib/liblz4.a brotli_ep/src/brotli_ep-install/lib/x86_64-linux-gnu/libbrotlidec.a brotli_ep/src/brotli_ep-install/lib/x86_64-linux-gnu/libbrotlienc.a brotli_ep/src/brotli_ep-install/lib/x86_64-linux-gnu/libbrotlicommon.a -lpthread -Wl,-rpath,/home/antoine/t/ttarrow/cpp/build/debug:/home/antoine/miniconda3/envs/pyarrow/lib -Wl,-rpath-link,/home/antoine/miniconda3/envs/pyarrow/lib && : /home/antoine/cpython/37/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/libpython3.7m.a(posixmodule.o): In function `os_openpty_impl': /home/antoine/cpython/37/./Modules/posixmodule.c:6140: undefined reference to `openpty' /home/antoine/cpython/37/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/libpython3.7m.a(posixmodule.o): In function `os_forkpty_impl': /home/antoine/cpython/37/./Modules/posixmodule.c:6234: undefined reference to `forkpty' {code} The problem here is that Python needs to link against {{libutil}} on Linux. This can be queried using the {{sysconfig}} module: {code} >>> sysconfig.get_config_var('LIBS') '-lpthread -ldl -lutil' >>> sysconfig.get_config_var('SHLIBS') '-lpthread -ldl -lutil' {code} However we need to turn those command-line arguments into library arguments for CMake. Just passing the linker flags to the {{ADD_ARROW_LIB}} call doesn't seem to work. > [Python] Compile and test with Python 3.7 > - > > Key: ARROW-1661 > URL: https://issues.apache.org/jira/browse/ARROW-1661 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > See discussion in https://github.com/apache/arrow/issues/1125 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR
[ https://issues.apache.org/jira/browse/ARROW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-2516. --- Resolution: Fixed Fix Version/s: 0.10.0 Issue resolved by pull request 1989 [https://github.com/apache/arrow/pull/1989] > AppVeyor Build Matrix should be specific to the changes made in a PR > > > Key: ARROW-2516 > URL: https://issues.apache.org/jira/browse/ARROW-2516 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Paddy Horan >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2389) [C++] Add StatusCode::OverflowError
[ https://issues.apache.org/jira/browse/ARROW-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2389: -- Labels: pull-request-available (was: ) > [C++] Add StatusCode::OverflowError > --- > > Key: ARROW-2389 > URL: https://issues.apache.org/jira/browse/ARROW-2389 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > It may be useful to have a {{StatusCode::OverflowError}} return code, to > signal that something overflowed allowed limits (e.g. the 2GB limit for > string or binary values). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-1886) [Python] Add function to "flatten" structs within tables
[ https://issues.apache.org/jira/browse/ARROW-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-1886. --- Resolution: Fixed Issue resolved by pull request 1768 [https://github.com/apache/arrow/pull/1768] > [Python] Add function to "flatten" structs within tables > > > Key: ARROW-1886 > URL: https://issues.apache.org/jira/browse/ARROW-1886 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > Time Spent: 10m > Remaining Estimate: 0h > > See discussion in https://issues.apache.org/jira/browse/ARROW-1873 > When a user has a struct column, it may be more efficient to flatten the > struct into multiple columns of the form {{struct_name.field_name}} for each > field in the struct. Then when you call {{to_pandas}}, Python dictionaries do > not have to be created, and the conversion will be much more efficient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR
[ https://issues.apache.org/jira/browse/ARROW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2516: -- Labels: pull-request-available (was: ) > AppVeyor Build Matrix should be specific to the changes made in a PR > > > Key: ARROW-2516 > URL: https://issues.apache.org/jira/browse/ARROW-2516 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Paddy Horan >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR
[ https://issues.apache.org/jira/browse/ARROW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-2516: - Assignee: Antoine Pitrou > AppVeyor Build Matrix should be specific to the changes made in a PR > > > Key: ARROW-2516 > URL: https://issues.apache.org/jira/browse/ARROW-2516 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Paddy Horan >Assignee: Antoine Pitrou >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2536) [Rust] ListBuilder uses wrong initial size for offset builder
[ https://issues.apache.org/jira/browse/ARROW-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned ARROW-2536: -- Assignee: Kane Kim > [Rust] ListBuilder uses wrong initial size for offset builder > - > > Key: ARROW-2536 > URL: https://issues.apache.org/jira/browse/ARROW-2536 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Kane Kim >Priority: Trivial > Labels: pull-request-available > Fix For: 0.10.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2536) [Rust] ListBuilder uses wrong initial size for offset builder
[ https://issues.apache.org/jira/browse/ARROW-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462020#comment-16462020 ] Kane Kim commented on ARROW-2536: - [~xhochy] This is Kane, you can assign to me. Thanks! > [Rust] ListBuilder uses wrong initial size for offset builder > - > > Key: ARROW-2536 > URL: https://issues.apache.org/jira/browse/ARROW-2536 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Priority: Trivial > Labels: pull-request-available > Fix For: 0.10.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)