[jira] [Updated] (ARROW-3799) Improve `make_in_expression`

2018-11-14 Thread Siyuan Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Zhuang updated ARROW-3799:
-
Description: The `make_in_expression` in gandiva was not implemented 
correctly. Although 
[ARROW-3751|https://issues.apache.org/jira/projects/ARROW/issues/ARROW-3751] 
has fixed part of it, further improvement is still necessary. See 
`test_in_expr_todo` in 
[python/pyarrow/tests/test_gandiva.py|https://github.com/apache/arrow/pull/2936/files#diff-9ab0e0dc1f329321ff4555b043ee0f41]
 for details.  (was: The `make_in_expression` in gandiva was not implemented 
correctly. Although 
[ARROW-3751|https://issues.apache.org/jira/projects/ARROW/issues/ARROW-3751] 
has fixed part of it, further improvement is still necessary.)

> Improve `make_in_expression`
> 
>
> Key: ARROW-3799
> URL: https://issues.apache.org/jira/browse/ARROW-3799
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Gandiva
>Reporter: Siyuan Zhuang
>Priority: Major
>
> The `make_in_expression` in gandiva was not implemented correctly. Although 
> [ARROW-3751|https://issues.apache.org/jira/projects/ARROW/issues/ARROW-3751] 
> has fixed part of it, further improvement is still necessary. See 
> `test_in_expr_todo` in 
> [python/pyarrow/tests/test_gandiva.py|https://github.com/apache/arrow/pull/2936/files#diff-9ab0e0dc1f329321ff4555b043ee0f41]
>  for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3799) Improve `make_in_expression`

2018-11-14 Thread Siyuan Zhuang (JIRA)
Siyuan Zhuang created ARROW-3799:


 Summary: Improve `make_in_expression`
 Key: ARROW-3799
 URL: https://issues.apache.org/jira/browse/ARROW-3799
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Gandiva
Reporter: Siyuan Zhuang


The `make_in_expression` in gandiva was not implemented correctly. Although 
[ARROW-3751|https://issues.apache.org/jira/projects/ARROW/issues/ARROW-3751] 
has fixed part of it, further improvement is still necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3751) [Python] Add more cython bindings for gandiva

2018-11-14 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3751.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 2936
[https://github.com/apache/arrow/pull/2936]

> [Python] Add more cython bindings for gandiva
> -
>
> Key: ARROW-3751
> URL: https://issues.apache.org/jira/browse/ARROW-3751
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Gandiva, Python
>Reporter: Siyuan Zhuang
>Assignee: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are some cython bindings lost in ARROW-3602 (MakeAdd, MakeOr, MakeIn). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3798) [GLib] Add support for column type CSV read options

2018-11-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3798:
--
Labels: pull-request-available  (was: )

> [GLib] Add support for column type CSV read options
> ---
>
> Key: ARROW-3798
> URL: https://issues.apache.org/jira/browse/ARROW-3798
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3798) [GLib] Add support for column type CSV read options

2018-11-14 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-3798:
---

 Summary: [GLib] Add support for column type CSV read options
 Key: ARROW-3798
 URL: https://issues.apache.org/jira/browse/ARROW-3798
 Project: Apache Arrow
  Issue Type: New Feature
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3186) [GLib] mesonbuild failures in Travis CI

2018-11-14 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687535#comment-16687535
 ] 

Kouhei Sutou commented on ARROW-3186:
-

It should work.
If it doesn't work, we should report it to Meson.

> [GLib] mesonbuild failures in Travis CI
> ---
>
> Key: ARROW-3186
> URL: https://issues.apache.org/jira/browse/ARROW-3186
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Wes McKinney
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something started breaking recently with mesonbuild
> {code}
> +env CFLAGS=-DARROW_NO_DEPRECATED_API CXXFLAGS=-DARROW_NO_DEPRECATED_API 
> meson build --prefix=/home/travis/build/apache/arrow/c-glib-install-meson 
> -Dgtk_doc=true
> Traceback (most recent call last):
>   File "/home/travis/miniconda/bin/meson", line 26, in 
> from mesonbuild import mesonmain
> ModuleNotFoundError: No module named 'mesonbuild'
> {code}
> Perhaps caused by the 8/25 release? https://pypi.org/project/meson/#history



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3186) [GLib] mesonbuild failures in Travis CI

2018-11-14 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3186:
---

Assignee: Kouhei Sutou

> [GLib] mesonbuild failures in Travis CI
> ---
>
> Key: ARROW-3186
> URL: https://issues.apache.org/jira/browse/ARROW-3186
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Wes McKinney
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Something started breaking recently with mesonbuild
> {code}
> +env CFLAGS=-DARROW_NO_DEPRECATED_API CXXFLAGS=-DARROW_NO_DEPRECATED_API 
> meson build --prefix=/home/travis/build/apache/arrow/c-glib-install-meson 
> -Dgtk_doc=true
> Traceback (most recent call last):
>   File "/home/travis/miniconda/bin/meson", line 26, in 
> from mesonbuild import mesonmain
> ModuleNotFoundError: No module named 'mesonbuild'
> {code}
> Perhaps caused by the 8/25 release? https://pypi.org/project/meson/#history



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3186) [GLib] mesonbuild failures in Travis CI

2018-11-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3186:
--
Labels: pull-request-available  (was: )

> [GLib] mesonbuild failures in Travis CI
> ---
>
> Key: ARROW-3186
> URL: https://issues.apache.org/jira/browse/ARROW-3186
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> Something started breaking recently with mesonbuild
> {code}
> +env CFLAGS=-DARROW_NO_DEPRECATED_API CXXFLAGS=-DARROW_NO_DEPRECATED_API 
> meson build --prefix=/home/travis/build/apache/arrow/c-glib-install-meson 
> -Dgtk_doc=true
> Traceback (most recent call last):
>   File "/home/travis/miniconda/bin/meson", line 26, in 
> from mesonbuild import mesonmain
> ModuleNotFoundError: No module named 'mesonbuild'
> {code}
> Perhaps caused by the 8/25 release? https://pypi.org/project/meson/#history



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3797) [Rust] BinaryArray::value_offset incorrect in offset case

2018-11-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3797:
--
Labels: pull-request-available  (was: )

> [Rust] BinaryArray::value_offset incorrect in offset case
> -
>
> Key: ARROW-3797
> URL: https://issues.apache.org/jira/browse/ARROW-3797
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Brent Kerby
>Assignee: Brent Kerby
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> The method BinaryArray::value_offset does not take into account the offset in 
> the underlying ArrayData; hence it gives incorrect results when the ArrayData 
> offset is not zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3797) [Rust] BinaryArray::value_offset incorrect in offset case

2018-11-14 Thread Brent Kerby (JIRA)
Brent Kerby created ARROW-3797:
--

 Summary: [Rust] BinaryArray::value_offset incorrect in offset case
 Key: ARROW-3797
 URL: https://issues.apache.org/jira/browse/ARROW-3797
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Brent Kerby
Assignee: Brent Kerby


The method BinaryArray::value_offset does not take into account the offset in 
the underlying ArrayData; hence it gives incorrect results when the ArrayData 
offset is not zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3754) [Packaging] Zstd configure error on linux package builds

2018-11-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3754:
--
Labels: pull-request-available  (was: )

> [Packaging] Zstd configure error on linux package builds
> 
>
> Key: ARROW-3754
> URL: https://issues.apache.org/jira/browse/ARROW-3754
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> Ubuntu Xenial https://travis-ci.org/kszucs/crossbow/builds/453054759
> Ubuntu Bionic https://travis-ci.org/kszucs/crossbow/builds/453054805
> Ubuntu Trusty https://travis-ci.org/kszucs/crossbow/builds/453054811
> Debian Stretch https://travis-ci.org/kszucs/crossbow/builds/453054727
> Perhaps this commit is related: 
> https://github.com/apache/arrow/commit/394b334bba1199bd2d98a158736a6652efce629f
> cc [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3765) [Gandiva] Segfault when the validity bitmap has not been allocated

2018-11-14 Thread Siyuan Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Zhuang reassigned ARROW-3765:


Assignee: Siyuan Zhuang

> [Gandiva] Segfault when the validity bitmap has not been allocated
> --
>
> Key: ARROW-3765
> URL: https://issues.apache.org/jira/browse/ARROW-3765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Siyuan Zhuang
>Assignee: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is because the `validity buffer` could be `None`:
> {code}
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10)))
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [None, ]
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0)
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [,  0x11a2b3228>]{code}
> But Gandiva has not implemented it yet, thus accessing a nullptr:
> {code}
> void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const 
> arrow::ArrayData& array_data, EvalBatch* eval_batch) { 
> int buffer_idx = 0;
> // TODO:  
> // - validity is optional 
> uint8_t* validity_buf = 
> const_cast(array_data.buffers[buffer_idx]->data());
> eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
> ++buffer_idx;
> {code}
>  
> Reproduce code:
> {code:java}
> frame_data = np.random.randint(0, 100, size=(2**22, 10))
> table = pa.Table.from_pandas(df)
> filt = ...  # Create any gandiva filter
> r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # 
> segfault{code}
>  Backtrace:
> {code:java}
> * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>  * frame #0: 0x0001060184fc 
> libarrow.12.dylib`arrow::Buffer::data(this=0x) const at 
> buffer.h:162
>  frame #1: 0x000106fbed78 
> libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8,
>  desc=0x00010101e138, array_data=0x00010061f8e8, 
> eval_batch=0x000100796848) at annotator.cc:65
>  frame #2: 0x000106fbf4ed 
> libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8,
>  record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94
>  frame #3: 0x0001071449b7 
> libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, 
> record_batch=0x0001007a45b8, output_vector=size=1) at 
> llvm_generator.cc:102
>  frame #4: 0x000107059a4f 
> libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, 
> batch=0x0001007a45b8, 
> out_selection=std::__1::shared_ptr::element_type @ 
> 0x0001007a43e8 strong=2 weak=1) at filter.cc:106
>  frame #5: 0x00010948e002 
> gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*,
>  _object*, _object*) + 1986
>  frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475
>  frame #7: 0x0001001d28ca Python`call_function + 602
>  frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #9: 0x0001001d3cf9 Python`fast_function + 569
>  frame #10: 0x0001001d2899 Python`call_function + 553
>  frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902
>  frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48
>  frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174
>  frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277
>  frame #16: 0x00010021ef46 Python`Py_Main + 3558
>  frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248
>  frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3796) [Rust] Add Example for PrimitiveArrayBuilder

2018-11-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3796:
--
Labels: pull-request-available  (was: )

> [Rust] Add Example for PrimitiveArrayBuilder
> 
>
> Key: ARROW-3796
> URL: https://issues.apache.org/jira/browse/ARROW-3796
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3796) [Rust] Add Example for PrimitiveArrayBuilder

2018-11-14 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3796:
--

 Summary: [Rust] Add Example for PrimitiveArrayBuilder
 Key: ARROW-3796
 URL: https://issues.apache.org/jira/browse/ARROW-3796
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3713) [Rust] Implement BinaryArrayBuilder

2018-11-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3713:
--
Labels: pull-request-available  (was: )

> [Rust] Implement BinaryArrayBuilder
> ---
>
> Key: ARROW-3713
> URL: https://issues.apache.org/jira/browse/ARROW-3713
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3791) [C++] Add type inference for boolean values in CSV files

2018-11-14 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687398#comment-16687398
 ] 

Kouhei Sutou commented on ARROW-3791:
-

Yes. I want type inference for boolean. Sorry for my ambiguous title.

I think that specifying type explicitly is very useful. But type inference is 
also useful even if it may choose wrong type because it's convenient. If type 
inference is failed, we can specify type explicitly. It's convenience than we 
specify all types explicitly.

> [C++] Add type inference for boolean values in CSV files
> 
>
> Key: ARROW-3791
> URL: https://issues.apache.org/jira/browse/ARROW-3791
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: csv
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3795) [R] Support for retrieving NAs from INT64 arrays

2018-11-14 Thread Javier Luraschi (JIRA)
Javier Luraschi created ARROW-3795:
--

 Summary: [R] Support for retrieving NAs from INT64 arrays
 Key: ARROW-3795
 URL: https://issues.apache.org/jira/browse/ARROW-3795
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Javier Luraschi


I have a repro using sparklyr but likely to be possible to repro this through 
c++ bindings:

 

 
{code:java}
library(sparklyr)
library(arrow)

sc <- spark_connect(mater = "local")
DBI::dbGetQuery(sc, "SELECT cast(NULL as bigint)")
{code}
Actual:

 
{code:java}
  CAST(NULL AS BIGINT)
1 -4332462841530417152
{code}
 

Expected:
{code:java}
  CAST(NULL AS BIGINT)
1   NA
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3765) [Gandiva] Segfault when the validity bitmap has not been allocated

2018-11-14 Thread Siyuan Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Zhuang updated ARROW-3765:
-
Summary: [Gandiva] Segfault when the validity bitmap has not been allocated 
 (was: [Gandiva] Segfault when validity bitmap has not been allocated)

> [Gandiva] Segfault when the validity bitmap has not been allocated
> --
>
> Key: ARROW-3765
> URL: https://issues.apache.org/jira/browse/ARROW-3765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
>
> This is because the `validity buffer` could be `None`:
> {code}
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10)))
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [None, ]
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0)
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [,  0x11a2b3228>]{code}
> But Gandiva has not implemented it yet, thus accessing a nullptr:
> {code}
> void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const 
> arrow::ArrayData& array_data, EvalBatch* eval_batch) { 
> int buffer_idx = 0;
> // TODO:  
> // - validity is optional 
> uint8_t* validity_buf = 
> const_cast(array_data.buffers[buffer_idx]->data());
> eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
> ++buffer_idx;
> {code}
>  
> Reproduce code:
> {code:java}
> frame_data = np.random.randint(0, 100, size=(2**22, 10))
> table = pa.Table.from_pandas(df)
> filt = ...  # Create any gandiva filter
> r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # 
> segfault{code}
>  Backtrace:
> {code:java}
> * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>  * frame #0: 0x0001060184fc 
> libarrow.12.dylib`arrow::Buffer::data(this=0x) const at 
> buffer.h:162
>  frame #1: 0x000106fbed78 
> libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8,
>  desc=0x00010101e138, array_data=0x00010061f8e8, 
> eval_batch=0x000100796848) at annotator.cc:65
>  frame #2: 0x000106fbf4ed 
> libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8,
>  record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94
>  frame #3: 0x0001071449b7 
> libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, 
> record_batch=0x0001007a45b8, output_vector=size=1) at 
> llvm_generator.cc:102
>  frame #4: 0x000107059a4f 
> libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, 
> batch=0x0001007a45b8, 
> out_selection=std::__1::shared_ptr::element_type @ 
> 0x0001007a43e8 strong=2 weak=1) at filter.cc:106
>  frame #5: 0x00010948e002 
> gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*,
>  _object*, _object*) + 1986
>  frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475
>  frame #7: 0x0001001d28ca Python`call_function + 602
>  frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #9: 0x0001001d3cf9 Python`fast_function + 569
>  frame #10: 0x0001001d2899 Python`call_function + 553
>  frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902
>  frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48
>  frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174
>  frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277
>  frame #16: 0x00010021ef46 Python`Py_Main + 3558
>  frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248
>  frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3765) [Gandiva] Segfault when the validity bitmap has not been allocated

2018-11-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3765:
--
Labels: pull-request-available  (was: )

> [Gandiva] Segfault when the validity bitmap has not been allocated
> --
>
> Key: ARROW-3765
> URL: https://issues.apache.org/jira/browse/ARROW-3765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
>
> This is because the `validity buffer` could be `None`:
> {code}
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10)))
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [None, ]
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0)
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [,  0x11a2b3228>]{code}
> But Gandiva has not implemented it yet, thus accessing a nullptr:
> {code}
> void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const 
> arrow::ArrayData& array_data, EvalBatch* eval_batch) { 
> int buffer_idx = 0;
> // TODO:  
> // - validity is optional 
> uint8_t* validity_buf = 
> const_cast(array_data.buffers[buffer_idx]->data());
> eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
> ++buffer_idx;
> {code}
>  
> Reproduce code:
> {code:java}
> frame_data = np.random.randint(0, 100, size=(2**22, 10))
> table = pa.Table.from_pandas(df)
> filt = ...  # Create any gandiva filter
> r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # 
> segfault{code}
>  Backtrace:
> {code:java}
> * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>  * frame #0: 0x0001060184fc 
> libarrow.12.dylib`arrow::Buffer::data(this=0x) const at 
> buffer.h:162
>  frame #1: 0x000106fbed78 
> libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8,
>  desc=0x00010101e138, array_data=0x00010061f8e8, 
> eval_batch=0x000100796848) at annotator.cc:65
>  frame #2: 0x000106fbf4ed 
> libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8,
>  record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94
>  frame #3: 0x0001071449b7 
> libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, 
> record_batch=0x0001007a45b8, output_vector=size=1) at 
> llvm_generator.cc:102
>  frame #4: 0x000107059a4f 
> libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, 
> batch=0x0001007a45b8, 
> out_selection=std::__1::shared_ptr::element_type @ 
> 0x0001007a43e8 strong=2 weak=1) at filter.cc:106
>  frame #5: 0x00010948e002 
> gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*,
>  _object*, _object*) + 1986
>  frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475
>  frame #7: 0x0001001d28ca Python`call_function + 602
>  frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #9: 0x0001001d3cf9 Python`fast_function + 569
>  frame #10: 0x0001001d2899 Python`call_function + 553
>  frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902
>  frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48
>  frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174
>  frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277
>  frame #16: 0x00010021ef46 Python`Py_Main + 3558
>  frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248
>  frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3794) [R] Consider mapping INT8 to integer() not raw()

2018-11-14 Thread Javier Luraschi (JIRA)
Javier Luraschi created ARROW-3794:
--

 Summary: [R] Consider mapping INT8 to integer() not raw()
 Key: ARROW-3794
 URL: https://issues.apache.org/jira/browse/ARROW-3794
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Javier Luraschi


The Arrow:BINARY type maps better to R's raw(), while Arrow::INT8 maps better 
to R's integer() since currently, NA's are not supported when collecting INT8's 
and numerical operations can't be performed against raw().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this

2018-11-14 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687321#comment-16687321
 ] 

Bryan Cutler commented on ARROW-3192:
-

Thanks [~wesmckinn], yes I think these would affect Spark. I'll try to take a 
closer look at this when I can. 

> [Java] Implement "ArrowBufReadChannel" abstraction and alternate 
> MessageSerializer that uses this
> -
>
> Key: ARROW-3192
> URL: https://issues.apache.org/jira/browse/ARROW-3192
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> The current MessageSerializer implementation is wasteful when used to read an 
> IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, 
> reads out of a {{ReadChannel}} require memory allocation
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290
> In C++, we have abstracted memory allocation out of the IPC read path so that 
> zero-copy is possible. I suggest that a similar mechanism can be developed 
> for Java to improve deserialization performance for in-memory messages. The 
> new interface would return {{ArrowBuf}} when performing reads, which could be 
> zero-copy when possible, but when not the current strategy of allocate-copy 
> could be used



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3419) [C++] Run include-what-you-use checks in Travis CI

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687256#comment-16687256
 ] 

Wes McKinney commented on ARROW-3419:
-

This might be better as a nightly build that can be perused as desired...

> [C++] Run include-what-you-use checks in Travis CI
> --
>
> Key: ARROW-3419
> URL: https://issues.apache.org/jira/browse/ARROW-3419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> As part of linting (and running linter checks in a separate Travis entry), we 
> should also run include-what-you-use on changed files so that we can force 
> include cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3421) [C++] Add include-what-you-use setup to primary docker-compose.yml

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3421.
-
Resolution: Fixed

This was done already, can be run with {{docker-compose run iwyu}}

> [C++] Add include-what-you-use setup to primary docker-compose.yml
> --
>
> Key: ARROW-3421
> URL: https://issues.apache.org/jira/browse/ARROW-3421
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3419) [C++] Run include-what-you-use checks in Travis CI

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3419:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Run include-what-you-use checks in Travis CI
> --
>
> Key: ARROW-3419
> URL: https://issues.apache.org/jira/browse/ARROW-3419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> As part of linting (and running linter checks in a separate Travis entry), we 
> should also run include-what-you-use on changed files so that we can force 
> include cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3775) [C++] Handling Parquet Arrow reads that overflow a BinaryArray capacity

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3775:

Summary: [C++] Handling Parquet Arrow reads that overflow a BinaryArray 
capacity  (was: [C++] Handling Arrow reads that overflow a BinaryArray capacity)

> [C++] Handling Parquet Arrow reads that overflow a BinaryArray capacity
> ---
>
> Key: ARROW-3775
> URL: https://issues.apache.org/jira/browse/ARROW-3775
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.12.0
>
>
> See comment thread in 
> https://stackoverflow.com/questions/48115087/converting-parquetfile-to-pandas-dataframe-with-a-column-with-a-set-of-string-in
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2997) [Python] Scripts for uploading conda binary release artifacts to anaconda.org under @apache account

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2997:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Scripts for uploading conda binary release artifacts to anaconda.org 
> under @apache account
> ---
>
> Key: ARROW-2997
> URL: https://issues.apache.org/jira/browse/ARROW-2997
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> We do not have the ability to post these artifacts in conda-forge, but we 
> could post them under apache https://anaconda.org/apache



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2981) [C++] Support scripts / documentation for running clang-tidy on codebase

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2981:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Support scripts / documentation for running clang-tidy on codebase
> 
>
> Key: ARROW-2981
> URL: https://issues.apache.org/jira/browse/ARROW-2981
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Related to ARROW-2952, ARROW-2980



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3325) [Python] Support reading Parquet binary/string columns as pandas Categorical

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3325:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Support reading Parquet binary/string columns as pandas Categorical
> 
>
> Key: ARROW-3325
> URL: https://issues.apache.org/jira/browse/ARROW-3325
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Requires PARQUET-1324 and probably quite a bit of extra work  
> Properly implementing this will require dictionary normalization across row 
> groups. When reading a new row group, a fast path that compares the current 
> dictionary with the prior dictionary should be used. This also needs to 
> handle the case where a column chunk "fell back" to PLAIN encoding mid-stream



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2888) [Plasma] Several GPU-related APIs are used in places where errors cannot be appropriately handled

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687240#comment-16687240
 ] 

Wes McKinney commented on ARROW-2888:
-

Plasma CUDA support is as yet experimental, but it would be a good idea to fix 
these at some point

> [Plasma] Several GPU-related APIs are used in places where errors cannot be 
> appropriately handled
> -
>
> Key: ARROW-2888
> URL: https://issues.apache.org/jira/browse/ARROW-2888
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I'm adding {{DCHECK_OK}} statements for ARROW-2883 to fix the unchecked 
> Status warnings, but this code should be refactored so that these errors can 
> bubble up properly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2981) [C++] Support scripts / documentation for running clang-tidy on codebase

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687238#comment-16687238
 ] 

Wes McKinney commented on ARROW-2981:
-

Probably not all of them, but many of them yes

> [C++] Support scripts / documentation for running clang-tidy on codebase
> 
>
> Key: ARROW-2981
> URL: https://issues.apache.org/jira/browse/ARROW-2981
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Related to ARROW-2952, ARROW-2980



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2888) [Plasma] Several GPU-related APIs are used in places where errors cannot be appropriately handled

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2888:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Plasma] Several GPU-related APIs are used in places where errors cannot be 
> appropriately handled
> -
>
> Key: ARROW-2888
> URL: https://issues.apache.org/jira/browse/ARROW-2888
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I'm adding {{DCHECK_OK}} statements for ARROW-2883 to fix the unchecked 
> Status warnings, but this code should be refactored so that these errors can 
> bubble up properly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3255) [C++/Python] Migrate Travis CI jobs off Xcode 6.4

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687230#comment-16687230
 ] 

Wes McKinney commented on ARROW-3255:
-

What is conda-forge's plan about this?

> [C++/Python] Migrate Travis CI jobs off Xcode 6.4
> -
>
> Key: ARROW-3255
> URL: https://issues.apache.org/jira/browse/ARROW-3255
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Travis CI says they are winding down their support for Xcode 6.4, which we 
> use in our CI as the minimum Xcode which can build Arrow libraries
> "Running builds with Xcode 6.4 in Travis CI is deprecated and will be removed 
> in January 2019.
> If Xcode 6.4 is critical to your builds, please contact our support team at 
> supp...@travis-ci.com to discuss options.
> Services are not supported on osx"
> We should decide if we want to continue to support this version of Xcode, and 
> what are the implications if we do not



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3294) [C++] Test Flight RPC on Windows / Appveyor

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3294:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Test Flight RPC on Windows / Appveyor
> ---
>
> Key: ARROW-3294
> URL: https://issues.apache.org/jira/browse/ARROW-3294
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3290) [C++] Toolchain support for secure gRPC

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3290:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Toolchain support for secure gRPC 
> 
>
> Key: ARROW-3290
> URL: https://issues.apache.org/jira/browse/ARROW-3290
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> In ARROW-3146 I added support for the narrow use case of CMake-installed gRPC 
> and linking with the unsecure libraries. There are a number of additional 
> dependencies to be able to connect to secure services



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2041) [Python] pyarrow.serialize has high overhead for list of NumPy arrays

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2041:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] pyarrow.serialize has high overhead for list of NumPy arrays
> -
>
> Key: ARROW-2041
> URL: https://issues.apache.org/jira/browse/ARROW-2041
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Richard Shin
>Priority: Major
> Fix For: 0.13.0
>
>
> {{Python 2.7.12 (default, Nov 20 2017, 18:23:56)}}
> {{[GCC 5.4.0 20160609] on linux2}}
> {{Type "help", "copyright", "credits" or "license" for more information.}}
> {{>>> import pyarrow as pa, numpy as np}}
> {{>>> arrays = [np.arange(100, dtype=np.int32) for _ in range(1)]}}
> {{>>> with open('test.pyarrow', 'w') as f:}}
> {{... f.write(pa.serialize(arrays).to_buffer().to_pybytes())}}
> {{...}}
> {{>>> import cPickle as pickle}}
> {{>>> pickle.dump(arrays, open('test.pkl', 'w'), pickle.HIGHEST_PROTOCOL)}}
> test.pyarrow is 6.2 MB, while test.pkl is only 4.2 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3016) [C++] Add ability to enable call stack logging for each memory allocation

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3016:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Add ability to enable call stack logging for each memory allocation
> -
>
> Key: ARROW-3016
> URL: https://issues.apache.org/jira/browse/ARROW-3016
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> It is possible to gain programmatic access to the call stack in C/C++, e.g.
> https://eli.thegreenplace.net/2015/programmatic-access-to-the-call-stack-in-c/
> It would be valuable to have a debugging option to log the sizes of memory 
> allocations as well as showing the call stack where that allocation is 
> performed. In complex programs, this could help determine the origin of a 
> memory leak



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3401) [C++] Pluggable statistics collector API for unconvertible CSV values

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3401:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Pluggable statistics collector API for unconvertible CSV values
> -
>
> Key: ARROW-3401
> URL: https://issues.apache.org/jira/browse/ARROW-3401
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> It would be useful to be able to collect statistics (e.g. distinct value 
> counts) about values in a column of a CSV file that cannot be converted to a 
> desired data type. 
> When conversion fails, the converters can call into an abstract API like
> {code}
> statistics_->CannotConvert(token, size);
> {code}
> or something similar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3378) [C++] Implement whitespace CSV tokenizer

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3378:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Implement whitespace CSV tokenizer
> 
>
> Key: ARROW-3378
> URL: https://issues.apache.org/jira/browse/ARROW-3378
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3379) [C++] Implement regex/multichar delimiter tokenizer

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3379:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Implement regex/multichar delimiter tokenizer
> ---
>
> Key: ARROW-3379
> URL: https://issues.apache.org/jira/browse/ARROW-3379
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: csv
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3362) [R] Guard against null buffers

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3362.
-
Resolution: Fixed
  Assignee: Romain François

This was resolved in other patches

> [R] Guard against null buffers
> --
>
> Key: ARROW-3362
> URL: https://issues.apache.org/jira/browse/ARROW-3362
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Romain François
>Priority: Major
> Fix For: 0.12.0
>
>
> The R C++ bindings feature a number of cases where a data buffer is being 
> accessed when it may possibly be null in the case of a 0-length array. This 
> will cause a segfault
> https://github.com/apache/arrow/blob/master/r/src/array.cpp#L245
> I suggest defining some helper functions as we have done elsewhere to assist 
> with getting a typed offset into a buffer but passing through nullptr if the 
> buffer object is null



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3328) [Flight] Allow for optional unique flight identifier to be sent with FlightGetInfo

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3328:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Flight] Allow for optional unique flight identifier to be sent with 
> FlightGetInfo
> --
>
> Key: ARROW-3328
> URL: https://issues.apache.org/jira/browse/ARROW-3328
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> There could either be
> * A global identifier for the entire flight
> * Endpoint-specific identifiers
> A client could use these unique identifier to perform other kinds of actions. 
> An example would be retrieving logs or statistics about a get -- you could 
> see time spent writing the dataset to gRPC or time spent constructing the 
> dataset before handing off to the gRPC write layer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3082) [C++] Add SSL support for hiveserver2

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3082:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Add SSL support for hiveserver2
> -
>
> Key: ARROW-3082
> URL: https://issues.apache.org/jira/browse/ARROW-3082
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: HiveServer2
> Fix For: 0.13.0
>
>
> This amounts to using the TSSLSocket in Thrift



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3312) [R] Use same .clang-format file for both R binding C++ code and main C++ codebase

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3312.
-
Resolution: Fixed
  Assignee: Wes McKinney

This was done in 
https://github.com/apache/arrow/commit/2a6c0cbf92318103df2d74cf089b6ad11b32d373

> [R] Use same .clang-format file for both R binding C++ code and main C++ 
> codebase
> -
>
> Key: ARROW-3312
> URL: https://issues.apache.org/jira/browse/ARROW-3312
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> Comment to ARROW-3282



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3052) [C++] Support ORC, GRPC, Thrift, and Protobuf when using $ARROW_BUILD_TOOLCHAIN

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3052:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Support ORC, GRPC, Thrift, and Protobuf when using 
> $ARROW_BUILD_TOOLCHAIN
> ---
>
> Key: ARROW-3052
> URL: https://issues.apache.org/jira/browse/ARROW-3052
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> It would be good to support these additional toolchain components without 
> having to set extra environment variables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3103) [C++] Conversion to Arrow record batch for HiveServer2 ColumnarRowSet

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3103:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Conversion to Arrow record batch for HiveServer2 ColumnarRowSet
> -
>
> Key: ARROW-3103
> URL: https://issues.apache.org/jira/browse/ARROW-3103
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: HiveServer2, database
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3318) [C++] Convenience method for reading all batches from an IPC stream or file as arrow::Table

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3318:
---

Assignee: (was: Atri Sharma)

> [C++] Convenience method for reading all batches from an IPC stream or file 
> as arrow::Table
> ---
>
> Key: ARROW-3318
> URL: https://issues.apache.org/jira/browse/ARROW-3318
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> This is being implemented more than once in binding layers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3308) [R] Convert R character vector with data exceeding 2GB to chunked array

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3308:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [R] Convert R character vector with data exceeding 2GB to chunked array
> ---
>
> Key: ARROW-3308
> URL: https://issues.apache.org/jira/browse/ARROW-3308
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3292) [C++] Test Flight RPC in Travis CI

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3292:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Test Flight RPC in Travis CI
> --
>
> Key: ARROW-3292
> URL: https://issues.apache.org/jira/browse/ARROW-3292
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3289) [C++] Implement DoPut command for Flight on client and server side

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3289:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Implement DoPut command for Flight on client and server side  
> 
>
> Key: ARROW-3289
> URL: https://issues.apache.org/jira/browse/ARROW-3289
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> This was omitted from ARROW-3146



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3292) [C++] Test Flight RPC in Travis CI

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687232#comment-16687232
 ] 

Wes McKinney commented on ARROW-3292:
-

Need gRPC toolchain support first. Not sure that will get done in the next 
couple of weeks

> [C++] Test Flight RPC in Travis CI
> --
>
> Key: ARROW-3292
> URL: https://issues.apache.org/jira/browse/ARROW-3292
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3200) [C++] Add support for reading Flight streams with dictionaries

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3200:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Add support for reading Flight streams with dictionaries
> --
>
> Key: ARROW-3200
> URL: https://issues.apache.org/jira/browse/ARROW-3200
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> Some work is needed to handle schemas sent separately from their 
> dictionaries, i.e. ARROW-3144. I'm going to punt on implementing support for 
> this in the initial C++ Flight client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3255) [C++/Python] Migrate Travis CI jobs off Xcode 6.4

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3255:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++/Python] Migrate Travis CI jobs off Xcode 6.4
> -
>
> Key: ARROW-3255
> URL: https://issues.apache.org/jira/browse/ARROW-3255
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Travis CI says they are winding down their support for Xcode 6.4, which we 
> use in our CI as the minimum Xcode which can build Arrow libraries
> "Running builds with Xcode 6.4 in Travis CI is deprecated and will be removed 
> in January 2019.
> If Xcode 6.4 is critical to your builds, please contact our support team at 
> supp...@travis-ci.com to discuss options.
> Services are not supported on osx"
> We should decide if we want to continue to support this version of Xcode, and 
> what are the implications if we do not



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687229#comment-16687229
 ] 

Wes McKinney commented on ARROW-3192:
-

cc [~bryanc], in case this might impact Spark anywhere

> [Java] Implement "ArrowBufReadChannel" abstraction and alternate 
> MessageSerializer that uses this
> -
>
> Key: ARROW-3192
> URL: https://issues.apache.org/jira/browse/ARROW-3192
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> The current MessageSerializer implementation is wasteful when used to read an 
> IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, 
> reads out of a {{ReadChannel}} require memory allocation
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290
> In C++, we have abstracted memory allocation out of the IPC read path so that 
> zero-copy is possible. I suggest that a similar mechanism can be developed 
> for Java to improve deserialization performance for in-memory messages. The 
> new interface would return {{ArrowBuf}} when performing reads, which could be 
> zero-copy when possible, but when not the current strategy of allocate-copy 
> could be used



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3192:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Java] Implement "ArrowBufReadChannel" abstraction and alternate 
> MessageSerializer that uses this
> -
>
> Key: ARROW-3192
> URL: https://issues.apache.org/jira/browse/ARROW-3192
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> The current MessageSerializer implementation is wasteful when used to read an 
> IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, 
> reads out of a {{ReadChannel}} require memory allocation
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290
> In C++, we have abstracted memory allocation out of the IPC read path so that 
> zero-copy is possible. I suggest that a similar mechanism can be developed 
> for Java to improve deserialization performance for in-memory messages. The 
> new interface would return {{ArrowBuf}} when performing reads, which could be 
> zero-copy when possible, but when not the current strategy of allocate-copy 
> could be used



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3189) [Python] Support seek(...) on writable files that support it

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3189:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Support seek(...) on writable files that support it 
> -
>
> Key: ARROW-3189
> URL: https://issues.apache.org/jira/browse/ARROW-3189
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> See relevant mailing list discussion
> https://lists.apache.org/thread.html/67fc945fa01b7cf682a241f36de09fe495b84b119868dd7c9f8168ba@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3084) [Python] Do we need to build both unicode variants of pyarrow wheels?

2018-11-14 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687225#comment-16687225
 ] 

Antoine Pitrou commented on ARROW-3084:
---

I expect this issue will get fixed simply when we abandon 2.7.

> [Python] Do we need to build both unicode variants of pyarrow wheels?
> -
>
> Key: ARROW-3084
> URL: https://issues.apache.org/jira/browse/ARROW-3084
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I noticed that pandas does not provide a UCS2 wheel for Python 2.7. We're 
> building both UCS2 and UCS4. I am curious if the UCS2 wheels are widely used 
> enough to make this worthwhile



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3134) [C++] Implement n-ary iterator for a collection of chunked arrays with possibly different chunking layouts

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3134:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Implement n-ary iterator for a collection of chunked arrays with 
> possibly different chunking layouts
> --
>
> Key: ARROW-3134
> URL: https://issues.apache.org/jira/browse/ARROW-3134
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This is a common pattern that will result in kernel invocation on chunked 
> arrays



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3087) [C++] Add kernels for comparison operations to scalars

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3087:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Add kernels for comparison operations to scalars
> --
>
> Key: ARROW-3087
> URL: https://issues.apache.org/jira/browse/ARROW-3087
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: Analytics
> Fix For: 0.13.0
>
>
> This should implement the comparison operators  {{>=, >, ==, !=, <, <=}} 
> between {{arrow::compute::Datum}} and {{arrow::compute::Scalar}}. 
> The result of this kernel will be a boolean type {{arrow::compute::Datum}} 
> where with True/False set according to the outcome of the operation and NA if 
> a row was not valid.
> A pre-condition to implement this kernel is to have a working implementation 
> of {{arrow::compute::Scalar}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3084) [Python] Do we need to build both unicode variants of pyarrow wheels?

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3084:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Do we need to build both unicode variants of pyarrow wheels?
> -
>
> Key: ARROW-3084
> URL: https://issues.apache.org/jira/browse/ARROW-3084
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I noticed that pandas does not provide a UCS2 wheel for Python 2.7. We're 
> building both UCS2 and UCS4. I am curious if the UCS2 wheels are widely used 
> enough to make this worthwhile



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3104) [Python] Python bindings for HiveServer2 client interface

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3104:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Python bindings for HiveServer2 client interface
> -
>
> Key: ARROW-3104
> URL: https://issues.apache.org/jira/browse/ARROW-3104
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: HiveServer2
> Fix For: 0.13.0
>
>
> These will be a 1-1 mapping to the current C++ classes, with support for 
> yielding Arrow record batches or tables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3133) [C++] Logical boolean kernels in kernels/boolean.cc cannot write into preallocated memory

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3133:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Logical boolean kernels in kernels/boolean.cc cannot write into 
> preallocated memory
> -
>
> Key: ARROW-3133
> URL: https://issues.apache.org/jira/browse/ARROW-3133
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3081) [C++] Add LDAP authentication for hiveserver2

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3081:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Add LDAP authentication for hiveserver2
> -
>
> Key: ARROW-3081
> URL: https://issues.apache.org/jira/browse/ARROW-3081
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: HiveServer2
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3077) [Website] Add page summarizing project contributions since project inception

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3077:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Website] Add page summarizing project contributions since project inception
> 
>
> Key: ARROW-3077
> URL: https://issues.apache.org/jira/browse/ARROW-3077
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> We have already been doing this on a per-release basis, e.g.
> http://arrow.apache.org/release/0.10.0.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3054) [Packaging] Deploy nightlies built using crossbow to the twosigma conda channel

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687222#comment-16687222
 ] 

Wes McKinney commented on ARROW-3054:
-

+1 for arrow-nightlies. 

> [Packaging] Deploy nightlies built using crossbow to the twosigma conda 
> channel
> ---
>
> Key: ARROW-3054
> URL: https://issues.apache.org/jira/browse/ARROW-3054
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.10.0
>Reporter: Phillip Cloud
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3078) [Python] Docker integration tests should not contaminate the local Python development environment

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3078:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Docker integration tests should not contaminate the local Python 
> development environment
> -
>
> Key: ARROW-3078
> URL: https://issues.apache.org/jira/browse/ARROW-3078
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I hit the following error when running run_docker_compose.sh hdfs_integration
> {code}
> ___ ERROR collecting pyarrow/tests/test_builder.py 
> 
> import file mismatch:
> imported module 'pyarrow.tests.test_builder' has this __file__ attribute:
>   /home/wesm/code/arrow/python/pyarrow/tests/test_builder.py
> which is not the same as the test file we want to collect:
>   /apache-arrow/arrow/python/pyarrow/tests/test_builder.py
> HINT: remove __pycache__ / .pyc files and/or use a unique basename for your 
> test file modules
> ___ ERROR collecting pyarrow/tests/test_convert_builtin.py 
> 
> import file mismatch:
> imported module 'pyarrow.tests.test_convert_builtin' has this __file__ 
> attribute:
>   /home/wesm/code/arrow/python/pyarrow/tests/test_convert_builtin.py
> which is not the same as the test file we want to collect:
>   /apache-arrow/arrow/python/pyarrow/tests/test_convert_builtin.py
> HINT: remove __pycache__ / .pyc files and/or use a unique basename for your 
> test file modules
>  ERROR collecting pyarrow/tests/test_convert_pandas.py 
> 
> {code}
> The Docker tests should ideally be isolated from the state of the local git 
> clone



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3054) [Packaging] Deploy nightlies built using crossbow to the twosigma conda channel

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3054:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Packaging] Deploy nightlies built using crossbow to the twosigma conda 
> channel
> ---
>
> Key: ARROW-3054
> URL: https://issues.apache.org/jira/browse/ARROW-3054
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.10.0
>Reporter: Phillip Cloud
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2609:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Java/Python] Complex type conversion in pyarrow.Field.from_jvm
> ---
>
> Key: ARROW-2609
> URL: https://issues.apache.org/jira/browse/ARROW-2609
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> The converter {{pyarrow.Field.from_jvm}} currently only works for primitive 
> types. Types like List, Struct or Union that have children in their 
> definition are not supported. We should add the needed recursion for these 
> types and enable the respective tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2619) [Rust] Move JSON serde code to separate file/module

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2619:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Rust] Move JSON serde code to separate file/module
> ---
>
> Key: ARROW-2619
> URL: https://issues.apache.org/jira/browse/ARROW-2619
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2606) [Java/Python]  Add unit test for pyarrow.decimal128 in Array.from_jvm

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2606:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Java/Python]  Add unit test for pyarrow.decimal128 in Array.from_jvm
> -
>
> Key: ARROW-2606
> URL: https://issues.apache.org/jira/browse/ARROW-2606
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249. We need to 
> find the correct code to construct Java decimals and fill them into a 
> {{DecimalVector}}. Afterwards, we should activate the decimal128 type on 
> {{test_jvm_array}} and ensure that we load them correctly from Java into 
> Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3048) [Python] Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue)

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687221#comment-16687221
 ] 

Wes McKinney commented on ARROW-3048:
-

Is this still an issue?

> [Python] Import pyarrow fails if scikit-learn is installed from conda 
> (boost-cpp / libboost issue)
> --
>
> Key: ARROW-3048
> URL: https://issues.apache.org/jira/browse/ARROW-3048
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.10.0
> Environment: Ubuntu 16.04
>Reporter: Jarno Seppanen
>Priority: Major
> Fix For: 0.12.0
>
>
> Hi, installing both pyarrow 0.10.0 and scikit-learn 0.19.2 causes pyarrow 
> import to break.
> Steps to reproduce
>  # cat >environment.yml < {code:java}
> name: asdf
> channels:
> - defaults
> - conda-forge
> dependencies:
> - python=3.6
> - pyarrow=0.10.0
> - scikit-learn=0.19.2{code}
> EOF
>  # conda env create
>  # source activate asdf
>  # python -c 'import pyarrow'
> {code:java}
> Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/home/jarno/miniconda3/envs/asdf/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 60, in 
> from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> /home/jarno/miniconda3/envs/asdf/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.1:
>  undefined symbol: 
> _ZN5boost13match_resultsIN9__gnu_cxx17__normal_iteratorIPKcSsEESaINS_9sub_matchIS5_12maybe_assignERKS9_{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2602) [C++/Python] Automate build of development docker container

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2602:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++/Python] Automate build of development docker container
> ---
>
> Key: ARROW-2602
> URL: https://issues.apache.org/jira/browse/ARROW-2602
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> With 
> [https://github.com/apache/arrow/pull/2016|https://github.com/apache/arrow/pull/2016#pullrequestreview-121047089]
>  we provide a convenience docker container so that one can develop Arrow but 
> does not directly run into the hassles of setting up the development on chain 
> his machine.
> The current base image is not build automatically as we are waiting for input 
> from INFRA on https://issues.apache.org/jira/browse/INFRA-16533
> Once we know how to upload continously to docker hub, we should move the 
> Dockerfile appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2987) [Python] test_cython_api can fail if run in an environment where vsvarsall.bat has been run more than once

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2987.
---

> [Python] test_cython_api can fail if run in an environment where 
> vsvarsall.bat has been run more than once
> --
>
> Key: ARROW-2987
> URL: https://issues.apache.org/jira/browse/ARROW-2987
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.10.0
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> I encountered this when verifying the 0.10.0 release on Windows
> {code}
> pyarrow/tests/test_cython.py::test_cython_api Compiling 
> pyarrow_cython_example.pyx because it changed.
> [1/1] Cythonizing pyarrow_cython_example.pyx
> running build_ext
> building 'pyarrow_cython_example' extension
> b'*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00\r\x00\n\x00*\x00*\x00
>  \x00V\x00i\x00s\x00u\x00a\x00l\x00 \x00S\x00t\x00u\x00d\x00i\x00o\x00 
> \x002\x000\x001\x007\x00 \x00D\x00e\x00v\x00e\x00l\x00o\x00p\x00e\x00r\x00 
> \x00C\x00o\x00m\x00m\x00a\x00n\x00d\x00 \x00P\x00r\x00o\x00m\x00p\x00t\x00 
> \x00v\x001\x005\x00.\x004\x00.\x001\x00\r\x00\n\x00*\x00*\x00 
> \x00C\x00o\x00p\x00y\x00r\x00i\x00g\x00h\x00t\x00 \x00(\x00c\x00)\x00 
> \x002\x000\x001\x007\x00 \x00M\x00i\x00c\x00r\x00o\x00s\x00o\x00f\x00t\x00 
> \x00C\x00o\x00r\x00p\x00o\x00r\x00a\x00t\x00i\x00o\x00n\x00\r\x00\n\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00\r\x00\n\x00T\x00h\x00e\x00
>  \x00i\x00n\x00p\x00u\x00t\x00 \x00l\x00i\x00n\x00e\x00 \x00i\x00s\x00 
> \x00t\x00o\x00o\x00 \x00l\x00o\x00n\x00g\x00.\x00\r\x00\n\x00T\x00h\x00e\x00 
> \x00s\x00y\x00n\x00t\x00a\x00x\x00 \x00o\x00f\x00 \x00t\x00h\x00e\x00 
> \x00c\x00o\x00m\x00m\x00a\x00n\x00d\x00 \x00i\x00s\x00 
> \x00i\x00n\x00c\x00o\x00r\x00r\x00e\x00c\x00t\x00.\x00\r\x00\n\x00'
> error: Error executing cmd /u /c "C:\Program Files (x86)\Microsoft Visual 
> Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64 && set
> {code}
> Seems to be the same issue as 
> https://developercommunity.visualstudio.com/content/problem/257260/vcvarsallbat-reports-the-input-line-is-too-long-if.html
> I think the problem is that this unit test inherits the calling environment, 
> and distutils runs this command again which alters the environment further. 
> I'm not sure what's the best way to work around this yet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3032) [Python] Clean up NumPy-related C++ headers

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3032:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Clean up NumPy-related C++ headers
> ---
>
> Key: ARROW-3032
> URL: https://issues.apache.org/jira/browse/ARROW-3032
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> There are 4 different headers. After ARROW-2814, we can probably eliminate 
> numpy_convert.h and combine with numpy_to_arrow.h



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2987) [Python] test_cython_api can fail if run in an environment where vsvarsall.bat has been run more than once

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2987.
-
Resolution: Won't Fix
  Assignee: Wes McKinney

For now our recommendation is that users do not run vcvarsall.bat more than 
once in the same shell before running this unit test

> [Python] test_cython_api can fail if run in an environment where 
> vsvarsall.bat has been run more than once
> --
>
> Key: ARROW-2987
> URL: https://issues.apache.org/jira/browse/ARROW-2987
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.10.0
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> I encountered this when verifying the 0.10.0 release on Windows
> {code}
> pyarrow/tests/test_cython.py::test_cython_api Compiling 
> pyarrow_cython_example.pyx because it changed.
> [1/1] Cythonizing pyarrow_cython_example.pyx
> running build_ext
> building 'pyarrow_cython_example' extension
> b'*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00\r\x00\n\x00*\x00*\x00
>  \x00V\x00i\x00s\x00u\x00a\x00l\x00 \x00S\x00t\x00u\x00d\x00i\x00o\x00 
> \x002\x000\x001\x007\x00 \x00D\x00e\x00v\x00e\x00l\x00o\x00p\x00e\x00r\x00 
> \x00C\x00o\x00m\x00m\x00a\x00n\x00d\x00 \x00P\x00r\x00o\x00m\x00p\x00t\x00 
> \x00v\x001\x005\x00.\x004\x00.\x001\x00\r\x00\n\x00*\x00*\x00 
> \x00C\x00o\x00p\x00y\x00r\x00i\x00g\x00h\x00t\x00 \x00(\x00c\x00)\x00 
> \x002\x000\x001\x007\x00 \x00M\x00i\x00c\x00r\x00o\x00s\x00o\x00f\x00t\x00 
> \x00C\x00o\x00r\x00p\x00o\x00r\x00a\x00t\x00i\x00o\x00n\x00\r\x00\n\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00\r\x00\n\x00T\x00h\x00e\x00
>  \x00i\x00n\x00p\x00u\x00t\x00 \x00l\x00i\x00n\x00e\x00 \x00i\x00s\x00 
> \x00t\x00o\x00o\x00 \x00l\x00o\x00n\x00g\x00.\x00\r\x00\n\x00T\x00h\x00e\x00 
> \x00s\x00y\x00n\x00t\x00a\x00x\x00 \x00o\x00f\x00 \x00t\x00h\x00e\x00 
> \x00c\x00o\x00m\x00m\x00a\x00n\x00d\x00 \x00i\x00s\x00 
> \x00i\x00n\x00c\x00o\x00r\x00r\x00e\x00c\x00t\x00.\x00\r\x00\n\x00'
> error: Error executing cmd /u /c "C:\Program Files (x86)\Microsoft Visual 
> Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64 && set
> {code}
> Seems to be the same issue as 
> https://developercommunity.visualstudio.com/content/problem/257260/vcvarsallbat-reports-the-input-line-is-too-long-if.html
> I think the problem is that this unit test inherits the calling environment, 
> and distutils runs this command again which alters the environment further. 
> I'm not sure what's the best way to work around this yet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2980) [C++] Dockerfile for running Infer static analysis

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2980:

Fix Version/s: (was: 0.12.0)

> [C++] Dockerfile for running Infer static analysis
> --
>
> Key: ARROW-2980
> URL: https://issues.apache.org/jira/browse/ARROW-2980
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> Similar to ARROW-2952, it would be useful to periodically run Infer to find 
> problems that can be found with static analysis. This was added to the 
> codebase in ARROW-1626 
> https://github.com/apache/arrow/blob/master/cpp/build-support/run-infer.sh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2904) [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687216#comment-16687216
 ] 

Wes McKinney commented on ARROW-2904:
-

This is also a bit tricky since the bitmap may experience a realloc during the 
course of building an array. This is non-urgent so moving off 0.12

> [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc
> ---
>
> Key: ARROW-2904
> URL: https://issues.apache.org/jira/browse/ARROW-2904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> See discussion in patch for ARROW-2826



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2904) [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2904:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc
> ---
>
> Key: ARROW-2904
> URL: https://issues.apache.org/jira/browse/ARROW-2904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> See discussion in patch for ARROW-2826



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3184) [C++] Add modular build targets, "all" target, and require explicit target when invoking make or ninja

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3184:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Add modular build targets, "all" target, and require explicit target 
> when invoking make or ninja
> --
>
> Key: ARROW-3184
> URL: https://issues.apache.org/jira/browse/ARROW-3184
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This will make it easier to build and install only part of the project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3162) [Python] Enable Flight servers to be implemented in pure Python

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3162:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Enable Flight servers to be implemented in pure Python
> ---
>
> Key: ARROW-3162
> URL: https://issues.apache.org/jira/browse/ARROW-3162
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> While it will be straightforward to offer a Flight client to Python users, 
> enabling _servers_ to be written _in Python_ will require a glue class to 
> invoke methods on a provided server implementation, coercing to and from 
> various Python objects and Arrow wrapper classes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3186) [GLib] mesonbuild failures in Travis CI

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687213#comment-16687213
 ] 

Wes McKinney commented on ARROW-3186:
-

[~kou] there have been a few releases since this issue originally occurred. 
Does the build work with the latest meson release now (0.48.2)?

> [GLib] mesonbuild failures in Travis CI
> ---
>
> Key: ARROW-3186
> URL: https://issues.apache.org/jira/browse/ARROW-3186
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> Something started breaking recently with mesonbuild
> {code}
> +env CFLAGS=-DARROW_NO_DEPRECATED_API CXXFLAGS=-DARROW_NO_DEPRECATED_API 
> meson build --prefix=/home/travis/build/apache/arrow/c-glib-install-meson 
> -Dgtk_doc=true
> Traceback (most recent call last):
>   File "/home/travis/miniconda/bin/meson", line 26, in 
> from mesonbuild import mesonmain
> ModuleNotFoundError: No module named 'mesonbuild'
> {code}
> Perhaps caused by the 8/25 release? https://pypi.org/project/meson/#history



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3150) [Python] Ship Flight-enabled Python wheels

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3150:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Ship Flight-enabled Python wheels
> --
>
> Key: ARROW-3150
> URL: https://issues.apache.org/jira/browse/ARROW-3150
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> This may involve statically-linking (or bundling where shared libs makes 
> sense) the various required dependencies with {{libarrow_flight.so}} in the 
> manylinux1 wheel build



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3185) [C++] Address libparquet SO version convention in unified build

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3185:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Address libparquet SO version convention in unified build
> ---
>
> Key: ARROW-3185
> URL: https://issues.apache.org/jira/browse/ARROW-3185
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Follow up work to ARROW-3075



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2248) [Python] Nightly or on-demand HDFS test builds

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2248:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Nightly or on-demand HDFS test builds
> --
>
> Key: ARROW-2248
> URL: https://issues.apache.org/jira/browse/ARROW-2248
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> We continue to acquire more functionality related to HDFS and Parquet. 
> Testing this, including tests that involve interoperability with other 
> systems, like Spark, will require some work outside of our normal CI 
> infrastructure.
> I suggest we start with testing the C++/Python HDFS integration, which will 
> help with validating patches like ARROW-1643 
> https://github.com/apache/arrow/pull/1668



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3185) [C++] Address libparquet SO version convention in unified build

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687211#comment-16687211
 ] 

Wes McKinney commented on ARROW-3185:
-

It seems that we are using the same ABI version as libarrow.so. Does that 
sounds reasonable to everyone?

> [C++] Address libparquet SO version convention in unified build
> ---
>
> Key: ARROW-3185
> URL: https://issues.apache.org/jira/browse/ARROW-3185
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Follow up work to ARROW-3075



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1688) [Java] Fail build on checkstyle warnings

2018-11-14 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687191#comment-16687191
 ] 

Bryan Cutler commented on ARROW-1688:
-

I think I can finish this up by 0.12, just need to enable Javadoc checks and 
then do a little writeup of the style changes

> [Java] Fail build on checkstyle warnings
> 
>
> Key: ARROW-1688
> URL: https://issues.apache.org/jira/browse/ARROW-1688
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 0.12.0
>
>
> see discussion in ARROW-1474
> My plan is to separate the stylecheck fixes into manageable chunks so they 
> are easier to review.  I'll do this by enabling the build to fail on style, 
> then suppressing all checks but the ones to be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2870) [Python] Define API for handling null markers from Array.to_numpy

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2870:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Define API for handling null markers from Array.to_numpy
> -
>
> Key: ARROW-2870
> URL: https://issues.apache.org/jira/browse/ARROW-2870
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This is follow-up work for {{Arrow.to_numpy}} started in ARROW-564



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3169) [C++] Break array-test.cc and array.cc into multiple compilation units

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3169:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Break array-test.cc and array.cc into multiple compilation units
> --
>
> Key: ARROW-3169
> URL: https://issues.apache.org/jira/browse/ARROW-3169
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> To improve readability I suggest splitting array-test.cc into multiple 
> compilation units (which will still be linked into an executable 
> {{array-test}}). We can do the same thing with array.h/array.cc, while 
> maintaining the {{arrow/array.h}} public header. Some of these could go into 
> {{arrow/columnar}} or {{arrow/impl}}, or something similar. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2870) [Python] Define API for handling null markers from Array.to_numpy

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687205#comment-16687205
 ] 

Wes McKinney commented on ARROW-2870:
-

We could either return a MaskedArray or a tuple of {{is_null[dtype=bool_], 
values}}

> [Python] Define API for handling null markers from Array.to_numpy
> -
>
> Key: ARROW-2870
> URL: https://issues.apache.org/jira/browse/ARROW-2870
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> This is follow-up work for {{Arrow.to_numpy}} started in ARROW-564



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2681) [C++] Use source releases when building ORC instead of using GitHub tag snapshots

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2681:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++] Use source releases when building ORC instead of using GitHub tag 
> snapshots
> -
>
> Key: ARROW-2681
> URL: https://issues.apache.org/jira/browse/ARROW-2681
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> See related discussion in ORC-374. It would be better to use the release 
> artifacts that have been voted on by the ORC PMC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2652) [C++/Python] Document how to provide information on segfaults

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2652:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++/Python] Document how to provide information on segfaults
> -
>
> Key: ARROW-2652
> URL: https://issues.apache.org/jira/browse/ARROW-2652
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> We often have users that report segmentation faults in {{pyarrow}}. This will 
> sadly keep reappearing as we also don't have the magical ability of writing 
> 100%-bug-free code. Thus we should have a small section in our documentation 
> on how people can give us the relevant information in the case of a 
> segmentation fault. Preferably the documentation covers {{gdb}} and {{lldb}}. 
> They both have similar commands but differ in some minor flags.
> For one of the example comments I gave to a user in tickets see 
> https://github.com/apache/arrow/issues/2089#issuecomment-393477116



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2671) [Python] Run ASV suite in nightly build, only run in Travis CI on demand

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2671:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Run ASV suite in nightly build, only run in Travis CI on demand
> 
>
> Key: ARROW-2671
> URL: https://issues.apache.org/jira/browse/ARROW-2671
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: nightly
> Fix For: 0.13.0
>
>
> Lately the main Travis CI build is running nearly 40 minutes long, e.g. here 
> is the latest commit on master
> https://travis-ci.org/apache/arrow/builds/387326546
> A fair chunk of the long runtime is spent running the Python benchmarks at 
> the end of the test suite. We should absolutely keep these running smoothly. 
> However:
> * It may be just as valuable to run them on master nightly, and report in if 
> they are broken
> * We could add a check to look at the commit message and run them in Travis 
> CI if requested
> If others agree, I suggest that as soon as the packaging bot / nightly build 
> tool is working properly, that we make these changes in the interest of 
> improving CI build times



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2607:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm
> ---
>
> Key: ARROW-2607
> URL: https://issues.apache.org/jira/browse/ARROW-2607
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently 
> only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses 
> {{pyarrow.Array.from_buffers}} underneath. We should extend one of the two 
> functions to be able to deal with string arrays. There is a currently failing 
> unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to 
> verify the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2610:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm
> ---
>
> Key: ARROW-2610
> URL: https://issues.apache.org/jira/browse/ARROW-2610
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> The DictionaryType is a bit more complex as it also references the dictionary 
> values itself. This also needs to be integrated into 
> {{pyarrow.Field.from_jvm}} but the work to make DictionaryType working maybe 
> also depends on that {{pyarrow.Array.from_jvm}} first supports non-primitive 
> arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2038) [Python] Follow-up bug fixes for s3fs Parquet support

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687187#comment-16687187
 ] 

Wes McKinney commented on ARROW-2038:
-

Does anyone want to put the S3 support through its paces before 0.12 goes out?

> [Python] Follow-up bug fixes for s3fs Parquet support
> -
>
> Key: ARROW-2038
> URL: https://issues.apache.org/jira/browse/ARROW-2038
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: aws, parquet
> Fix For: 0.12.0
>
>
> see discussion in 
> https://github.com/apache/arrow/pull/916#issuecomment-360558248



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2501) [Java] Remove Jackson from compile-time dependencies for arrow-vector

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2501:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Java] Remove Jackson from compile-time dependencies for arrow-vector
> -
>
> Key: ARROW-2501
> URL: https://issues.apache.org/jira/browse/ARROW-2501
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.9.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.13.0
>
>
> I would like to upgrade Jackson to the latest version (2.9.5). If there are 
> no objections I will create a PR (it is literally just changing the version 
> number in the pom - no code changes required).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2605:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm
> -
>
> Key: ARROW-2605
> URL: https://issues.apache.org/jira/browse/ARROW-2605
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are 
> missing the necessary methods to construct these arrays conveniently on the 
> Python side.
> Once there is a path to construct {{pyarrow.Array}} instances from a Python 
> list of {{datetime.time}} for the various time types, we should activate the 
> time types on {{test_jvm_array}} and ensure that we load them correctly from 
> Java into Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2461) [Python] Build wheels for manylinux2010 tag

2018-11-14 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687199#comment-16687199
 ] 

Wes McKinney commented on ARROW-2461:
-

I still think it wouldn't be _too_ evil to begin building with CentOS 6 as our 
base Linux for our wheels. gRPC is going to begin to use Abseil at some point 
from the look of the repo, and then we might be forced to do this anyway

> [Python] Build wheels for manylinux2010 tag
> ---
>
> Key: ARROW-2461
> URL: https://issues.apache.org/jira/browse/ARROW-2461
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Blocker
> Fix For: 0.13.0
>
>
> There is now work in progress on an updated manylinux tag based on CentOS6. 
> We should provide wheels for this tag and the old {{manylinux1}} tag for one 
> release and then switch to the new tag in the release afterwards. This should 
> enable us also to raise the minimum compiler requirement to gcc 4.9 (or 
> higher once conda-forge has migrated to a newer compiler).
> The relevant PEP is https://www.python.org/dev/peps/pep-0571/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2461) [Python] Build wheels for manylinux2010 tag

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2461:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Build wheels for manylinux2010 tag
> ---
>
> Key: ARROW-2461
> URL: https://issues.apache.org/jira/browse/ARROW-2461
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Blocker
> Fix For: 0.13.0
>
>
> There is now work in progress on an updated manylinux tag based on CentOS6. 
> We should provide wheels for this tag and the old {{manylinux1}} tag for one 
> release and then switch to the new tag in the release afterwards. This should 
> enable us also to raise the minimum compiler requirement to gcc 4.9 (or 
> higher once conda-forge has migrated to a newer compiler).
> The relevant PEP is https://www.python.org/dev/peps/pep-0571/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2412) [Integration] Add nested dictionary integration test

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2412:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Integration] Add nested dictionary integration test
> 
>
> Key: ARROW-2412
> URL: https://issues.apache.org/jira/browse/ARROW-2412
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Integration
>Reporter: Brian Hulette
>Priority: Major
> Fix For: 0.13.0
>
>
> Add nested dictionary generator to the integration test. The tests will 
> probably fail at first but can serve as a starting point for developing this 
> capability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2249) [Java/Python] in-process vector sharing from Java to Python

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2249:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Java/Python] in-process vector sharing from Java to Python
> ---
>
> Key: ARROW-2249
> URL: https://issues.apache.org/jira/browse/ARROW-2249
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 0.13.0
>
>
> Currently we seem to use in all applications of Arrow the IPC capabilities to 
> move data between a Java process and a Python process. While this is 
> 0-serialization, it is not zero-copy. By taking the address and offset, we 
> can already create Python buffers from Java buffers: 
> https://github.com/apache/arrow/pull/1693. This is still a very low-level 
> interface and we should provide the user with:
> * A guide on how to load Apache Arrow java libraries in Python (either 
> through a fat-jar that was shipped with Arrow or how he should integrate it 
> into its Java packaging)
> * {{pyarrow.Array.from_jvm}}, {{pyarrow.RecordBatch.from_jvm}}, … functions 
> that take the respective Java objects and emit Python objects. These Python 
> objects should also ensure that the underlying memory regions are kept alive 
> as long as the Python objects exist.
> This issue can also be used as a tracker for the various sub-tasks that will 
> need to be done to complete this rather large milestone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >