[jira] [Updated] (ARROW-2538) [Java] Introduce BaseWriter.writeNull method

2018-05-03 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated ARROW-2538:
--
Description: When a data set has null values in complex data type, it's 
hard to call proper writeNull method. Because writeNull is declared in 
AbstractFieldWriter, which is package local, not public. Moreover, 
UnionListWriter has no implementation of writeNull method, so it falls into 
AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by 
default. So BaseWriter.writeNull is required to write null values inside of 
complex data types.  (was: When a data set has null values in complex data 
type, it's hard to call proper writeNull method. Because writeNull is declared 
in AbstractFieldWriter, which is package local, not public. Moreover, 
UnionListWriter has no implementation of writeNull method, so it falls into 
AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by 
default. So FieldWriter.writeNull is required to write null values inside of 
complex data types.)
Summary: [Java] Introduce BaseWriter.writeNull method  (was: [Java] 
Introduce FieldWriter.writeNull method)

> [Java] Introduce BaseWriter.writeNull method
> 
>
> Key: ARROW-2538
> URL: https://issues.apache.org/jira/browse/ARROW-2538
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> When a data set has null values in complex data type, it's hard to call 
> proper writeNull method. Because writeNull is declared in 
> AbstractFieldWriter, which is package local, not public. Moreover, 
> UnionListWriter has no implementation of writeNull method, so it falls into 
> AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by 
> default. So BaseWriter.writeNull is required to write null values inside of 
> complex data types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2538) [Java] Introduce FieldWriter.writeNull method

2018-05-03 Thread Teddy Choi (JIRA)
Teddy Choi created ARROW-2538:
-

 Summary: [Java] Introduce FieldWriter.writeNull method
 Key: ARROW-2538
 URL: https://issues.apache.org/jira/browse/ARROW-2538
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Teddy Choi
Assignee: Teddy Choi


When a data set has null values in complex data type, it's hard to call proper 
writeNull method. Because writeNull is declared in AbstractFieldWriter, which 
is package local, not public. Moreover, UnionListWriter has no implementation 
of writeNull method, so it falls into AbstractFieldWriter.writeNull, which 
throws an IllegalArgumentException by default. So FieldWriter.writeNull is 
required to write null values inside of complex data types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2278) [Python] deserializing Numpy struct arrays raises

2018-05-03 Thread Licht Takeuchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Licht Takeuchi reassigned ARROW-2278:
-

Assignee: Licht Takeuchi

> [Python] deserializing Numpy struct arrays raises
> -
>
> Key: ARROW-2278
> URL: https://issues.apache.org/jira/browse/ARROW-2278
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Licht Takeuchi
>Priority: Major
>
> {code:python}
> >>> import numpy as np
> >>> dt = np.dtype([('x', np.int8), ('y', np.float32)])
> >>> arr = np.arange(5*10, dtype=np.int8).view(dt)
> >>> pa.deserialize(pa.serialize(arr).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
> pa.deserialize(pa.serialize(arr).to_buffer())
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File "/home/antoine/arrow/python/pyarrow/serialization.py", line 44, in 
> _deserialize_numpy_array_list
> return np.array(data[0], dtype=np.dtype(data[1]))
> TypeError: a bytes-like object is required, not 'int'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-03 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463370#comment-16463370
 ] 

Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 5:09 AM:
---

SparseDataFrame is planned to be deprecated in pandas.
 [https://github.com/pandas-dev/pandas/issues/19239]


was (Author: licht-t):
SparseDataFrame is planned to be deprecated.
[https://github.com/pandas-dev/pandas/issues/19239]

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-03 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463370#comment-16463370
 ] 

Licht Takeuchi commented on ARROW-2273:
---

SparseDataFrame is planned to be deprecated.
[https://github.com/pandas-dev/pandas/issues/19239]

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1661) [Python] Compile and test with Python 3.7

2018-05-03 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462745#comment-16462745
 ] 

Antoine Pitrou commented on ARROW-1661:
---

I ran into linking issues while trying to compile Arrow against a self-compiled 
Python 3.7:
{code}
[3/3] Linking CXX executable debug/python-test
FAILED: : && /usr/bin/ccache /usr/bin/c++   -ggdb -O0  -Wall -std=c++11 -msse3  
-g   src/arrow/python/CMakeFiles/python-test.dir/python-test.cc.o  -o 
debug/python-test  -rdynamic debug/libarrow_python_test_main.a 
debug/libarrow_python.so.10.0.0 debug/libarrow.so.10.0.0 
/home/antoine/cpython/37/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/libpython3.7m.a
 googletest_ep-prefix/src/googletest_ep/lib/libgtest.a -lpthread -ldl 
zstd_ep-prefix/src/zstd_ep/lib/libzstd.a zlib_ep/src/zlib_ep-install/lib/libz.a 
snappy_ep/src/snappy_ep-install/lib/libsnappy.a 
lz4_ep-prefix/src/lz4_ep/lib/liblz4.a 
brotli_ep/src/brotli_ep-install/lib/x86_64-linux-gnu/libbrotlidec.a 
brotli_ep/src/brotli_ep-install/lib/x86_64-linux-gnu/libbrotlienc.a 
brotli_ep/src/brotli_ep-install/lib/x86_64-linux-gnu/libbrotlicommon.a 
-lpthread 
-Wl,-rpath,/home/antoine/t/ttarrow/cpp/build/debug:/home/antoine/miniconda3/envs/pyarrow/lib
 -Wl,-rpath-link,/home/antoine/miniconda3/envs/pyarrow/lib && :
/home/antoine/cpython/37/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/libpython3.7m.a(posixmodule.o):
 In function `os_openpty_impl':
/home/antoine/cpython/37/./Modules/posixmodule.c:6140: undefined reference to 
`openpty'
/home/antoine/cpython/37/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/libpython3.7m.a(posixmodule.o):
 In function `os_forkpty_impl':
/home/antoine/cpython/37/./Modules/posixmodule.c:6234: undefined reference to 
`forkpty'
{code}

The problem here is that Python needs to link against {{libutil}} on Linux. 
This can be queried using the {{sysconfig}} module:
{code}
>>> sysconfig.get_config_var('LIBS')
'-lpthread -ldl  -lutil'
>>> sysconfig.get_config_var('SHLIBS')
'-lpthread -ldl  -lutil'
{code}

However we need to turn those command-line arguments into library arguments for 
CMake. Just passing the linker flags to the {{ADD_ARROW_LIB}} call doesn't seem 
to work.

> [Python] Compile and test with Python 3.7
> -
>
> Key: ARROW-1661
> URL: https://issues.apache.org/jira/browse/ARROW-1661
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> See discussion in https://github.com/apache/arrow/issues/1125



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR

2018-05-03 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2516.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1989
[https://github.com/apache/arrow/pull/1989]

> AppVeyor Build Matrix should be specific to the changes made in a PR
> 
>
> Key: ARROW-2516
> URL: https://issues.apache.org/jira/browse/ARROW-2516
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Paddy Horan
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2389) [C++] Add StatusCode::OverflowError

2018-05-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2389:
--
Labels: pull-request-available  (was: )

> [C++] Add StatusCode::OverflowError
> ---
>
> Key: ARROW-2389
> URL: https://issues.apache.org/jira/browse/ARROW-2389
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> It may be useful to have a {{StatusCode::OverflowError}} return code, to 
> signal that something overflowed allowed limits (e.g. the 2GB limit for 
> string or binary values).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1886) [Python] Add function to "flatten" structs within tables

2018-05-03 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-1886.
---
Resolution: Fixed

Issue resolved by pull request 1768
[https://github.com/apache/arrow/pull/1768]

> [Python] Add function to "flatten" structs within tables
> 
>
> Key: ARROW-1886
> URL: https://issues.apache.org/jira/browse/ARROW-1886
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See discussion in https://issues.apache.org/jira/browse/ARROW-1873
> When a user has a struct column, it may be more efficient to flatten the 
> struct into multiple columns of the form {{struct_name.field_name}} for each 
> field in the struct. Then when you call {{to_pandas}}, Python dictionaries do 
> not have to be created, and the conversion will be much more efficient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR

2018-05-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2516:
--
Labels: pull-request-available  (was: )

> AppVeyor Build Matrix should be specific to the changes made in a PR
> 
>
> Key: ARROW-2516
> URL: https://issues.apache.org/jira/browse/ARROW-2516
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Paddy Horan
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR

2018-05-03 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2516:
-

Assignee: Antoine Pitrou

> AppVeyor Build Matrix should be specific to the changes made in a PR
> 
>
> Key: ARROW-2516
> URL: https://issues.apache.org/jira/browse/ARROW-2516
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Paddy Horan
>Assignee: Antoine Pitrou
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2536) [Rust] ListBuilder uses wrong initial size for offset builder

2018-05-03 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2536:
--

Assignee: Kane Kim

> [Rust] ListBuilder uses wrong initial size for offset builder
> -
>
> Key: ARROW-2536
> URL: https://issues.apache.org/jira/browse/ARROW-2536
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Kane Kim
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2536) [Rust] ListBuilder uses wrong initial size for offset builder

2018-05-03 Thread Kane Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462020#comment-16462020
 ] 

Kane Kim commented on ARROW-2536:
-

[~xhochy] This is Kane, you can assign to me. Thanks!

> [Rust] ListBuilder uses wrong initial size for offset builder
> -
>
> Key: ARROW-2536
> URL: https://issues.apache.org/jira/browse/ARROW-2536
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)