[jira] [Commented] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791283#comment-16791283
 ] 

Wes McKinney commented on ARROW-4844:
-

[~jeroenooms] the vendoring in the build system right now is oriented around 
producing dependency-free shared libraries. If you are static linking 
{{libarrow.a}} the expectation is that you are managing your own library 
toolchain and providing all transitive dependencies (I think this is what HP 
Vertica is doing for example). With the CMake refactor that Uwe is working on 
we can arrange so that the build system refuses to vendor anything at all. Once 
the CMake refactor lands we should document this aspect of the build system 
better so it's clear what static linkers need to do to use {{libarrow.a}}

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Assignee: Uwe L. Korn
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1560) [C++] Kernel implementations for "match" function

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791243#comment-16791243
 ] 

Wes McKinney commented on ARROW-1560:
-

Yes that sounds right to me

> [C++] Kernel implementations for "match" function
> -
>
> Key: ARROW-1560
> URL: https://issues.apache.org/jira/browse/ARROW-1560
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Preeti Suman
>Priority: Major
>  Labels: Analytics
> Fix For: 0.14.0
>
>
> Match computes a position index array from an array values into a set of 
> categories
> {code}
> match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a'])
> return [1, 0, 1, null, 0, 1, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4811) [C++] An incorrect dependency leads "ninja" to re-evaluate steps unnecessarily on subsequent calls

2019-03-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4811:
--
Labels: pull-request-available  (was: )

> [C++] An incorrect dependency leads "ninja" to re-evaluate steps 
> unnecessarily on subsequent calls
> --
>
> Key: ARROW-4811
> URL: https://issues.apache.org/jira/browse/ARROW-4811
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Not sure about the root cause yet but here are the 5 steps that are 
> re-executing
> {code}
> $ ninja -v
> [1/5] /usr/bin/ccache /usr/bin/g++  -DARROW_EXTRA_ERROR_CONTEXT 
> -DARROW_JEMALLOC 
> -DARROW_JEMALLOC_INCLUDE_DIR=/home/wesm/code/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//include
>  -DARROW_NO_DEPRECATED_API -DARROW_PYTHON_EXPORTING -DARROW_USE_GLOG 
> -DARROW_USE_SIMD -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 
> -DARROW_WITH_SNAPPY -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -Isrc -I../src 
> -isystem /home/wesm/cpp-toolchain/include -isystem 
> gbenchmark_ep/src/gbenchmark_ep-install/include -isystem 
> jemalloc_ep-prefix/src -isystem ../thirdparty/hadoop/include -isystem 
> orc_ep-install/include -isystem /home/wesm/cpp-toolchain/include/thrift 
> -isystem 
> /home/wesm/miniconda/envs/arrow-3.7/lib/python3.7/site-packages/numpy/core/include
>  -isystem /home/wesm/miniconda/envs/arrow-3.7/include/python3.7m 
> -Wno-noexcept-type  -fdiagnostics-color=always -O3 -DNDEBUG  -Wall 
> -Wno-unused-variable -msse4.2 -fno-omit-frame-pointer -O3 -DNDEBUG -fPIC   
> -std=gnu++11 -MD -MT 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o -MF 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o.d -o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o -c 
> ../src/arrow/python/flight.cc
> [2/5] : && /usr/bin/ccache /home/wesm/miniconda/envs/arrow-3.7/bin/cmake -E 
> remove release/libarrow_python.a && /usr/bin/ccache /usr/bin/ar qc 
> release/libarrow_python.a  
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/arrow_to_pandas.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/benchmark.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/common.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/config.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/decimal.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/deserialize.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/helpers.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/inference.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/init.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/io.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/numpy_convert.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/numpy_to_arrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/python_to_arrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/pyarrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/serialize.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o && 
> /usr/bin/ccache /usr/bin/ranlib release/libarrow_python.a && :
> [3/5] : && /usr/bin/ccache /usr/bin/g++ -fPIC -Wno-noexcept-type  
> -fdiagnostics-color=always -O3 -DNDEBUG  -Wall -Wno-unused-variable -msse4.2 
> -fno-omit-frame-pointer -O3 -DNDEBUG   -shared 
> -Wl,-soname,libarrow_python.so.13 -o release/libarrow_python.so.13.0.0 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/arrow_to_pandas.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/benchmark.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/common.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/config.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/decimal.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/deserialize.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/helpers.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/inference.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/init.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/io.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/numpy_convert.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/numpy_to_arrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/python_to_arrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/pyarrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/serialize.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o  
> 

[jira] [Assigned] (ARROW-4811) [C++] An incorrect dependency leads "ninja" to re-evaluate steps unnecessarily on subsequent calls

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4811:
---

Assignee: Wes McKinney

> [C++] An incorrect dependency leads "ninja" to re-evaluate steps 
> unnecessarily on subsequent calls
> --
>
> Key: ARROW-4811
> URL: https://issues.apache.org/jira/browse/ARROW-4811
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Not sure about the root cause yet but here are the 5 steps that are 
> re-executing
> {code}
> $ ninja -v
> [1/5] /usr/bin/ccache /usr/bin/g++  -DARROW_EXTRA_ERROR_CONTEXT 
> -DARROW_JEMALLOC 
> -DARROW_JEMALLOC_INCLUDE_DIR=/home/wesm/code/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//include
>  -DARROW_NO_DEPRECATED_API -DARROW_PYTHON_EXPORTING -DARROW_USE_GLOG 
> -DARROW_USE_SIMD -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 
> -DARROW_WITH_SNAPPY -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -Isrc -I../src 
> -isystem /home/wesm/cpp-toolchain/include -isystem 
> gbenchmark_ep/src/gbenchmark_ep-install/include -isystem 
> jemalloc_ep-prefix/src -isystem ../thirdparty/hadoop/include -isystem 
> orc_ep-install/include -isystem /home/wesm/cpp-toolchain/include/thrift 
> -isystem 
> /home/wesm/miniconda/envs/arrow-3.7/lib/python3.7/site-packages/numpy/core/include
>  -isystem /home/wesm/miniconda/envs/arrow-3.7/include/python3.7m 
> -Wno-noexcept-type  -fdiagnostics-color=always -O3 -DNDEBUG  -Wall 
> -Wno-unused-variable -msse4.2 -fno-omit-frame-pointer -O3 -DNDEBUG -fPIC   
> -std=gnu++11 -MD -MT 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o -MF 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o.d -o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o -c 
> ../src/arrow/python/flight.cc
> [2/5] : && /usr/bin/ccache /home/wesm/miniconda/envs/arrow-3.7/bin/cmake -E 
> remove release/libarrow_python.a && /usr/bin/ccache /usr/bin/ar qc 
> release/libarrow_python.a  
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/arrow_to_pandas.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/benchmark.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/common.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/config.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/decimal.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/deserialize.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/helpers.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/inference.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/init.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/io.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/numpy_convert.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/numpy_to_arrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/python_to_arrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/pyarrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/serialize.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o && 
> /usr/bin/ccache /usr/bin/ranlib release/libarrow_python.a && :
> [3/5] : && /usr/bin/ccache /usr/bin/g++ -fPIC -Wno-noexcept-type  
> -fdiagnostics-color=always -O3 -DNDEBUG  -Wall -Wno-unused-variable -msse4.2 
> -fno-omit-frame-pointer -O3 -DNDEBUG   -shared 
> -Wl,-soname,libarrow_python.so.13 -o release/libarrow_python.so.13.0.0 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/arrow_to_pandas.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/benchmark.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/common.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/config.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/decimal.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/deserialize.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/helpers.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/inference.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/init.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/io.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/numpy_convert.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/numpy_to_arrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/python_to_arrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/pyarrow.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/serialize.cc.o 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/flight.cc.o  
> -Wl,-rpath,/home/wesm/code/arrow/cpp/build/release:/home/wesm/cpp-toolchain/lib:
>  -lpthread -ldl 

[jira] [Assigned] (ARROW-4645) [C++/Packaging] Ship Gandiva with OSX and Windows wheels

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4645:
---

Assignee: Krisztian Szucs

> [C++/Packaging] Ship Gandiva with OSX and Windows wheels
> 
>
> Key: ARROW-4645
> URL: https://issues.apache.org/jira/browse/ARROW-4645
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva, Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Gandiva is only installed via the linux wheels, We should support it on all 
> platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4850) [CI] Integration test failures do not fail the Travis CI build

2019-03-12 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4850:
---

 Summary: [CI] Integration test failures do not fail the Travis CI 
build
 Key: ARROW-4850
 URL: https://issues.apache.org/jira/browse/ARROW-4850
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Wes McKinney
 Fix For: 0.13.0


See https://github.com/apache/arrow/pull/3871

These changes fail the build, but it is reported as success

The errors can be seen in https://travis-ci.org/apache/arrow/jobs/505028161



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4335) [C++] Better document sparse tensor support

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4335.
-
Resolution: Fixed

Issue resolved by pull request 3810
[https://github.com/apache/arrow/pull/3810]

> [C++] Better document sparse tensor support
> ---
>
> Key: ARROW-4335
> URL: https://issues.apache.org/jira/browse/ARROW-4335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently the documentation (including docstrings) for the sparse tensor 
> classes and methods is very... sparse. It would be nice to make those 
> approachable.
> (also, a suggestion: rename {{SparseCSRIndex::indptr()}} to something else? 
> perhaps {{SparseCSRIndex::row_indices()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-295) Create DOAP File

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-295:
--

Assignee: Bruno P. Kinoshita

> Create DOAP File
> 
>
> Key: ARROW-295
> URL: https://issues.apache.org/jira/browse/ARROW-295
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Bruno P. Kinoshita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-295) Create DOAP File

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-295.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3307
[https://github.com/apache/arrow/pull/3307]

> Create DOAP File
> 
>
> Key: ARROW-295
> URL: https://issues.apache.org/jira/browse/ARROW-295
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4789) [C++] Deprecate and and later remove arrow::io::ReadableFileInterface

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4789.
-
Resolution: Fixed

Issue resolved by pull request 3831
[https://github.com/apache/arrow/pull/3831]

> [C++] Deprecate and and later remove arrow::io::ReadableFileInterface
> -
>
> Key: ARROW-4789
> URL: https://issues.apache.org/jira/browse/ARROW-4789
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See arrow/io/interfaces.h. This is a legacy alias



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4421) [Flight][C++] Handle large Flight data messages

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4421.
-
Resolution: Fixed

Issue resolved by pull request 3878
[https://github.com/apache/arrow/pull/3878]

> [Flight][C++] Handle large Flight data messages
> ---
>
> Key: ARROW-4421
> URL: https://issues.apache.org/jira/browse/ARROW-4421
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I believe the message payloads are currently limited to 4MB by default, see 
> one developer's discussion here:
> https://nanxiao.me/en/message-length-setting-in-grpc/
> While it is a good idea to break large messages into smaller ones, we will 
> need to address how to gracefully send larger payloads that may be provided 
> by a user's server implementation. Either we can increase the limit or break 
> up the record batches into smaller chunks in the Flight server base (or both, 
> of course)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3364) [Doc] Document docker compose setup

2019-03-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3364:
--
Labels: pull-request-available  (was: )

> [Doc] Document docker compose setup
> ---
>
> Key: ARROW-3364
> URL: https://issues.apache.org/jira/browse/ARROW-3364
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Krisztian Szucs
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Introduced by https://github.com/apache/arrow/pull/2572



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4383) [C++] Use the CMake's standard find features

2019-03-12 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791153#comment-16791153
 ] 

Kouhei Sutou commented on ARROW-4383:
-

I think that ARROW-4611 will resolve this too.
I'll check this again after ARROW-4611 is merged.

> [C++] Use the CMake's standard find features
> 
>
> Key: ARROW-4383
> URL: https://issues.apache.org/jira/browse/ARROW-4383
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Priority: Major
> Fix For: 0.14.0
>
>
> From https://github.com/apache/arrow/pull/3469#discussion_r250862542
> We implement our custom find codes to find libraries by 
> {{find_library()}}/{{find_path()}} with {{NO_DEFAULT_PATH}}. So we need to 
> handle {{lib64/}} (on Red Hat) and {{lib/x86_64-linux-gnu/}} (on Debian) 
> paths manually.
> If we use the CMake's standard find features such as {{CMAKE_PREFIX_PATH}} 
> https://cmake.org/cmake/help/v3.13/variable/CMAKE_PREFIX_PATH.html#variable:CMAKE_PREFIX_PATH
>  , we can remove our custom find codes.
> CMake has package specific find path feature by {{ 3.12. It's equivalent of our {{ https://cmake.org/cmake/help/v3.12/command/find_library.html
> {quote}
> If called from within a find module loaded by 
> {{find_package()}}, search prefixes unique to the current 
> package being found. Specifically look in the {{_ROOT}} CMake 
> variable and the {{_ROOT}} environment variable. The package 
> root variables are maintained as a stack so if called from nested find 
> modules, root paths from the parent’s find module will be searched after 
> paths from the current module, i.e. {{_ROOT}}, 
> {{ENV\{_ROOT}}}, {{_ROOT}}, 
> {{ENV\{_ROOT}}}, etc.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4835) [GLib] Add boolean operations

2019-03-12 Thread Yosuke Shiro (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yosuke Shiro resolved ARROW-4835.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3873
[https://github.com/apache/arrow/pull/3873]

> [GLib] Add boolean operations
> -
>
> Key: ARROW-4835
> URL: https://issues.apache.org/jira/browse/ARROW-4835
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Jeroen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791069#comment-16791069
 ] 

Jeroen commented on ARROW-4844:
---

OK thank you. FWIW: I was able to make the R bindings work by building an 
external libdouble-conversion, and then rebuilding libarrow using this external 
library by setting  -DDOUBLE_CONVERSION_HOME. By avoiding the vendored library, 
I can now statically link with libarrow by also including `-ldouble-conversion`.

Still, I do think that when arrow does get built with vendored libs, those 
should be shipped one way or another. Otherwise there is no way to link with 
the resulting libarrow library.

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Assignee: Uwe L. Korn
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Jeroen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791069#comment-16791069
 ] 

Jeroen edited comment on ARROW-4844 at 3/12/19 10:42 PM:
-

OK thank you. FWIW: I was able to make the R bindings work by building an 
external libdouble-conversion, and then rebuilding libarrow using this external 
library by setting  -DDOUBLE_CONVERSION_HOME. By avoiding the vendored library, 
I can now statically link against libarrow by also including 
`-ldouble-conversion`. So I have a workaround.

Still, I do think that when arrow does get built with vendored libs, those 
should be shipped one way or another. Otherwise there is no way to link with 
the resulting libarrow library.


was (Author: jeroenooms):
OK thank you. FWIW: I was able to make the R bindings work by building an 
external libdouble-conversion, and then rebuilding libarrow using this external 
library by setting  -DDOUBLE_CONVERSION_HOME. By avoiding the vendored library, 
I can now statically link against libarrow by also including 
`-ldouble-conversion`.

Still, I do think that when arrow does get built with vendored libs, those 
should be shipped one way or another. Otherwise there is no way to link with 
the resulting libarrow library.

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Assignee: Uwe L. Korn
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Jeroen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791069#comment-16791069
 ] 

Jeroen edited comment on ARROW-4844 at 3/12/19 10:42 PM:
-

OK thank you. FWIW: I was able to make the R bindings work by building an 
external libdouble-conversion, and then rebuilding libarrow using this external 
library by setting  -DDOUBLE_CONVERSION_HOME. By avoiding the vendored library, 
I can now statically link against libarrow by also including 
`-ldouble-conversion`.

Still, I do think that when arrow does get built with vendored libs, those 
should be shipped one way or another. Otherwise there is no way to link with 
the resulting libarrow library.


was (Author: jeroenooms):
OK thank you. FWIW: I was able to make the R bindings work by building an 
external libdouble-conversion, and then rebuilding libarrow using this external 
library by setting  -DDOUBLE_CONVERSION_HOME. By avoiding the vendored library, 
I can now statically link with libarrow by also including `-ldouble-conversion`.

Still, I do think that when arrow does get built with vendored libs, those 
should be shipped one way or another. Otherwise there is no way to link with 
the resulting libarrow library.

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Assignee: Uwe L. Korn
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4492) [Python] Failure reading Parquet column as pandas Categorical in 0.12

2019-03-12 Thread Hatem Helal (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791057#comment-16791057
 ] 

Hatem Helal commented on ARROW-4492:


[~wesmckinn], is this the same issue as ARROW-3325?

> [Python] Failure reading Parquet column as pandas Categorical in 0.12
> -
>
> Key: ARROW-4492
> URL: https://issues.apache.org/jira/browse/ARROW-4492
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: George Sakkis
>Priority: Major
>  Labels: Parquet
> Fix For: 0.13.0
>
> Attachments: slug.pq
>
>
> On pyarrow 0.12.0 some (but not all) columns cannot be read as category 
> dtype. Attached is an extracted failing sample.
>  {noformat}
> import dask.dataframe as dd
> df = dd.read_parquet('slug.pq', categories=['slug'], 
> engine='pyarrow').compute()
> print(len(df['slug'].dtype.categories))
>  {noformat}
> This works on pyarrow 0.11.1 (and fastparquet 0.2.1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4849) [C++] Add docker-compose entry for testing Ubuntu Bionic build with system packages

2019-03-12 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4849:
--

 Summary: [C++] Add docker-compose entry for testing Ubuntu Bionic 
build with system packages
 Key: ARROW-4849
 URL: https://issues.apache.org/jira/browse/ARROW-4849
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.13.0


To better support people on Ubuntu and also show the missing things to get 
Arrow packaged into Fedora, add an entry to the docker-compose.yml that builds 
on Ubuntu



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4596) [Rust] [DataFusion] Implement COUNT aggregate function

2019-03-12 Thread Luca Palmieri (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Palmieri reassigned ARROW-4596:


Assignee: Luca Palmieri  (was: Nicolas Trinquier)

> [Rust] [DataFusion] Implement COUNT aggregate function
> --
>
> Key: ARROW-4596
> URL: https://issues.apache.org/jira/browse/ARROW-4596
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Luca Palmieri
>Priority: Major
>  Labels: newbie
> Fix For: 0.13.0
>
>
> Add support for COUNT aggregate function. See SUM function in aggregates.rs 
> for inspiration.
> SQL parser should support COUNT(1) and COUNT ( * ) syntax for counting rows 
> as well as COUNT(column_name) for counting non-null values for a column.
> COUNT(DISTINCT expr) should be handled in a separate issue since it is more 
> complex.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4735) [Go] Benchmark strconv.Format vs. fmt.Sprintf for CSV writer

2019-03-12 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-4735.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3879
[https://github.com/apache/arrow/pull/3879]

> [Go] Benchmark strconv.Format vs. fmt.Sprintf for CSV writer
> 
>
> Key: ARROW-4735
> URL: https://issues.apache.org/jira/browse/ARROW-4735
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Anson Qian
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Need test out strconv.Format\{Bool,Float,Int,Uint} instead of fmt.Sprintf and 
> see if we can improve write performance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4825) [Python][C++] MemoryPool is destructed before deallocating its buffers leads to segfault

2019-03-12 Thread Pearu Peterson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pearu Peterson reassigned ARROW-4825:
-

Assignee: Pearu Peterson

> [Python][C++] MemoryPool is destructed before deallocating its buffers leads 
> to segfault 
> -
>
> Key: ARROW-4825
> URL: https://issues.apache.org/jira/browse/ARROW-4825
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.13.0
>Reporter: Pearu Peterson
>Assignee: Pearu Peterson
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Consider the following test function:
> ```
> def test_memory_pool():
>     import pyarrow as pa
>     pool = pa.logging_memory_pool(pa.default_memory_pool())
>     buf = pa.allocate_buffer(10, memory_pool=pool)
> ```
> that will fail with segfault when `pool` is garbage collected before `buf`. 
> However, the following test function succeeds:
> ```
> def test_memory_pool():
>     import pyarrow as pa
>     pool = pa.logging_memory_pool(pa.default_memory_pool())
>     buf = pa.allocate_buffer(10, memory_pool=pool)
>     del buf
> ```
> because all buffers are freed before `pool` destruction.
> To fix this issue, the pool instance should be attached to buffer instances 
> that the pool is creating. This will ensure that `pool` will be alive until 
> all its buffers are destroyed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4848) Static libparquet not compiled with -DARROW_STATIC on Windows

2019-03-12 Thread Jeroen (JIRA)
Jeroen created ARROW-4848:
-

 Summary: Static libparquet not compiled with -DARROW_STATIC on 
Windows
 Key: ARROW-4848
 URL: https://issues.apache.org/jira/browse/ARROW-4848
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.12.1
Reporter: Jeroen


When trying to link the R bindings against static libparquet.a + libarrow.a we 
get a lot of missing arrow symbol warnings from libparquet.a. I think the 
problem is that libparquet.a was not compiled -DARROW_STATIC, and therefore 
cannot be linked against libarrow.a.

When arrow cmake is configured with  -DARROW_BUILD_SHARED=OFF I think it should 
automatically use -DARROW_STATIC when compiling libparquet on Windows?




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4847) [Python] Add pyarrow.table factory function that dispatches to various ctors based on type of input

2019-03-12 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4847:
---

 Summary: [Python] Add pyarrow.table factory function that 
dispatches to various ctors based on type of input
 Key: ARROW-4847
 URL: https://issues.apache.org/jira/browse/ARROW-4847
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.13.0


For example, in {{pyarrow.table(df)}} if {{df}} is a {{pandas.DataFrame}}, then 
table will dispatch to {{pa.Table.from_pandas}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4846) [Java] Update Jackson to 2.9.8

2019-03-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4846:
--
Labels: pull-request-available  (was: )

> [Java] Update Jackson to 2.9.8
> --
>
> Key: ARROW-4846
> URL: https://issues.apache.org/jira/browse/ARROW-4846
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> We are looking at removing Jackson from arrow-vector dependencies in 
> ARROW-2501



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791011#comment-16791011
 ] 

Uwe L. Korn commented on ARROW-4844:


To solve this, it is probably best to build the {{libarrow.a}} for R then with 
{{ARROW_DEPENDENCY_SOURCE=BUNDLED}} (see 
[https://github.com/apache/arrow/pull/3688)] and add a new flag 
{{ARROW_WHOLE_ARCHIVE}} that adds {{--whole-archive}} on the linker. I can have 
a look after 3688 is merged.

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4724) [C++] Python not being built nor test under MinGW builds

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4724.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3869
[https://github.com/apache/arrow/pull/3869]

> [C++] Python not being built nor test under MinGW builds
> 
>
> Key: ARROW-4724
> URL: https://issues.apache.org/jira/browse/ARROW-4724
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Javier Luraschi
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Follow up to needed for 
> [arrow/pull/3693/files|https://github.com/apache/arrow/pull/3693/files].
> appveyor-cpp-build-mingw.bat has not yet enabled Python tests, need to revert,
> -DARROW_PYTHON=OFF
> Suggestion was to use,
> {code:java}
> diff --git a/ci/appveyor-cpp-build-mingw.bat b/ci/appveyor-cpp-build-mingw.bat
> index 06e8b7f7..3a853031 100644
> --- a/ci/appveyor-cpp-build-mingw.bat
> +++ b/ci/appveyor-cpp-build-mingw.bat
> @@ -24,6 +24,15 @@ set INSTALL_DIR=%HOMEDRIVE%%HOMEPATH%\install
> set PATH=%INSTALL_DIR%\bin;%PATH%
> set PKG_CONFIG_PATH=%INSTALL_DIR%\lib\pkgconfig
> +for /f "usebackq" %%v in (`python3 -c "import sys; print('.'.join(map(str, 
> sys.version_info[0:2])))"`) do (
> + set PYTHON_VERSION=%%v
> +)
> +
> +set PYTHONHOME=%MINGW_PREFIX%\lib\python%PYTHON_VERSION%
> +set PYTHONPATH=%PYTHONHOME%
> +set 
> PYTHONPATH=%PYTHONPATH%;%MINGW_PREFIX%\lib\python%PYTHON_VERSION%\lib-dynload
> +set 
> PYTHONPATH=%PYTHONPATH%;%MINGW_PREFIX%\lib\python%PYTHON_VERSION%\site-packages
> +
> {code}
> However, this suggestion currently trigger a built error in Travis,
> {code:java}
> [ 43%] Building CXX object 
> src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json-simple.cc.obj
> [ 44%] Building CXX object 
> src/arrow/CMakeFiles/arrow_objlib.dir/ipc/message.cc.obj
> [ 44%] Building CXX object 
> src/arrow/CMakeFiles/arrow_objlib.dir/ipc/metadata-internal.cc.obj
> [ 45%] Building CXX object 
> src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.obj
> [ 45%] Building CXX object 
> src/arrow/CMakeFiles/arrow_objlib.dir/ipc/writer.cc.obj
> [ 45%] Built target arrow_objlib
> make: *** [Makefile:141: all] Error 2
> C:\projects\arrow\cpp\build>goto scriptexit{code}
> Therefore, additional investigation is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4846) [Java] Update Jackson to 2.9.8

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4846.
-
Resolution: Fixed

Issue resolved by pull request 3877
[https://github.com/apache/arrow/pull/3877]

> [Java] Update Jackson to 2.9.8
> --
>
> Key: ARROW-4846
> URL: https://issues.apache.org/jira/browse/ARROW-4846
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are looking at removing Jackson from arrow-vector dependencies in 
> ARROW-2501



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4846) [Java] Update Jackson to 2.9.8

2019-03-12 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4846:
---

 Summary: [Java] Update Jackson to 2.9.8
 Key: ARROW-4846
 URL: https://issues.apache.org/jira/browse/ARROW-4846
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Wes McKinney
Assignee: Andy Grove
 Fix For: 0.13.0


We are looking at removing Jackson from arrow-vector dependencies in ARROW-2501



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4842) [C++] Persist CMake options in pkg-config files

2019-03-12 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791017#comment-16791017
 ] 

Kouhei Sutou commented on ARROW-4842:
-

Hmm. It seems that this is not a good idea.

For {{ARROW_WITH_ZSTD}}, we should generate {{arrow/feature.h}} by 
{{configure_file}} instead of putting {{-DARROW_WITH_ZSTD}}. (This will be done 
by ARROW-4840?)

For {{ARROW_PARQUET}}, users should use {{pkg-config --cflags --libs parquet}}. 
If the command returns non-zero, {{ARROW_PARQUET}} was {{OFF}}. (This is 
already done by providing separated {{.pc}} such as {{arrow.pc}}, 
{{parquet.pc}} and {{arrow-orc.pc}}.)

> [C++] Persist CMake options in pkg-config files
> ---
>
> Key: ARROW-4842
> URL: https://issues.apache.org/jira/browse/ARROW-4842
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> Persist options like ARROW_WITH_ZSTD in {{arrow.pc}} so libraries can 
> determine which features are available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4374) [C++] DictionaryBuilder does not correctly report length and null_count

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4374:

Component/s: C++

> [C++] DictionaryBuilder does not correctly report length and null_count
> ---
>
> Key: ARROW-4374
> URL: https://issues.apache.org/jira/browse/ARROW-4374
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In comparison to most builders, they stay constantly at 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4637) [Python] Avoid importing Pandas unless necessary

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4637:
---

Assignee: Wes McKinney

> [Python] Avoid importing Pandas unless necessary
> 
>
> Key: ARROW-4637
> URL: https://issues.apache.org/jira/browse/ARROW-4637
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Minor
> Fix For: 0.13.0
>
>
> Importing PyArrow is more than twice slower when Pandas is installed:
> {code}
> $ time python -c "import pyarrow"
> real  0m0,360s
> user  0m0,305s
> sys   0m0,037s
> $ time python -c "import sys; sys.modules['pandas'] = None; import pyarrow"
> real  0m0,144s
> user  0m0,124s
> sys   0m0,020s
> {code}
> We should only import Pandas when necessary, e.g. when asked to ingest or 
> create Pandas data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2119:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++][Java] Handle Arrow stream with zero record batch
> --
>
> Key: ARROW-2119
> URL: https://issues.apache.org/jira/browse/ARROW-2119
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java
>Reporter: Jingyuan Wang
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It looks like currently many places of the code assume that there needs to be 
> at least one record batch for streaming format. Is zero-recordbatch not 
> supported by design?
> e.g. 
> [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45]
> {code:none}
>   public static void convert(InputStream in, OutputStream out) throws 
> IOException {
> BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
> try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
>   VectorSchemaRoot root = reader.getVectorSchemaRoot();
>   // load the first batch before instantiating the writer so that we have 
> any dictionaries
>   if (!reader.loadNextBatch()) {
> throw new IOException("Unable to read first record batch");
>   }
>   ...
> {code}
> Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an 
> exception originated from 
> [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:]
> {code:none}
> Status Table::FromRecordBatches(const 
> std::vector>& batches,
> std::shared_ptr* table) {
>   if (batches.size() == 0) {
> return Status::Invalid("Must pass at least one record batch");
>   }
>   ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-4007) [Java][Plasma] Plasma JNI tests failing

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4007.
---
Resolution: Cannot Reproduce

> [Java][Plasma] Plasma JNI tests failing
> ---
>
> Key: ARROW-4007
> URL: https://issues.apache.org/jira/browse/ARROW-4007
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Priority: Critical
>  Labels: ci-failure
> Fix For: 0.13.0
>
>
> see https://travis-ci.org/apache/arrow/jobs/466819720
> {code}
> [INFO] Total time: 10.633 s
> [INFO] Finished at: 2018-12-12T03:56:33Z
> [INFO] Final Memory: 39M/426M
> [INFO] 
> 
>   linux-vdso.so.1 =>  (0x7ffcff172000)
>   librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f99ecd9e000)
>   libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7f99ecb85000)
>   libboost_system.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 (0x7f99ec981000)
>   libboost_filesystem.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.54.0 (0x7f99ec76b000)
>   libboost_regex.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.54.0 (0x7f99ec464000)
>   libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
> (0x7f99ec246000)
>   libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
> (0x7f99ebf3)
>   libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f99ebc2a000)
>   libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
> (0x7f99eba12000)
>   libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f99eb649000)
>   libicuuc.so.52 => /usr/lib/x86_64-linux-gnu/libicuuc.so.52 
> (0x7f99eb2d)
>   libicui18n.so.52 => /usr/lib/x86_64-linux-gnu/libicui18n.so.52 
> (0x7f99eaec9000)
>   /lib64/ld-linux-x86-64.so.2 (0x7f99ecfa6000)
>   libicudata.so.52 => /usr/lib/x86_64-linux-gnu/libicudata.so.52 
> (0x7f99e965c000)
>   libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f99e9458000)
> /home/travis/build/apache/arrow/cpp/src/plasma/store.cc:985: Allowing the 
> Plasma store to use up to 0.01GB of memory.
> /home/travis/build/apache/arrow/cpp/src/plasma/store.cc:1015: Starting object 
> store with directory /dev/shm and huge page support disabled
> Start process 317574433 OK, cmd = 
> [/home/travis/build/apache/arrow/cpp-install/bin/plasma_store_server  -s  
> /tmp/store89237  -m  1000]
> Start object store success
> Start test.
> Plasma java client put test success.
> Plasma java client get single object test success.
> Plasma java client get multi-object test success.
> ObjectId [B@34c45dca error at PlasmaClient put
> java.lang.Exception: An object with this ID already exists in the plasma 
> store.
>   at org.apache.arrow.plasma.PlasmaClientJNI.create(Native Method)
>   at org.apache.arrow.plasma.PlasmaClient.put(PlasmaClient.java:51)
>   at 
> org.apache.arrow.plasma.PlasmaClientTest.doTest(PlasmaClientTest.java:145)
>   at 
> org.apache.arrow.plasma.PlasmaClientTest.main(PlasmaClientTest.java:220)
> Plasma java client put same object twice exception test success.
> Plasma java client hash test success.
> Plasma java client contains test success.
> Plasma java client metadata get test success.
> Plasma java client delete test success.
> Kill plasma store process forcely
> All test success.
> ~/build/apache/arrow
> {code}
> I didn't see any related code changes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4356) [CI] Add integration (docker) test for turbodbc

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791013#comment-16791013
 ] 

Wes McKinney commented on ARROW-4356:
-

Leaving it in for 0.13 then

> [CI] Add integration (docker) test for turbodbc
> ---
>
> Key: ARROW-4356
> URL: https://issues.apache.org/jira/browse/ARROW-4356
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> We regularly break our API so that {{turbodbc}} needs to make minor changes 
> to support the new Arrow version. We should setup a small integration test to 
> check before a release that {{turbodbc}} can easily upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2523) [Rust] Implement CAST operations for arrays

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2523:

Component/s: Rust

> [Rust] Implement CAST operations for arrays
> ---
>
> Key: ARROW-2523
> URL: https://issues.apache.org/jira/browse/ARROW-2523
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.13.0
>
>
> I have implemented CAST operations in DataFusion but I would like to 
> re-implement this now directly in Arrow. I will create a PR after the Rust 
> refactor is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4094) [Python] Store RangeIndex in Parquet files as metadata rather than a physical data column

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4094:
---

Assignee: Wes McKinney

> [Python] Store RangeIndex in Parquet files as metadata rather than a physical 
> data column
> -
>
> Key: ARROW-4094
> URL: https://issues.apache.org/jira/browse/ARROW-4094
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This storage is wasteful in a lot of cases; we can define metadata for the 
> RangeIndex and store it in the "pandas" metadata field



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4383) [C++] Use the CMake's standard find features

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4383:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Use the CMake's standard find features
> 
>
> Key: ARROW-4383
> URL: https://issues.apache.org/jira/browse/ARROW-4383
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Priority: Major
> Fix For: 0.14.0
>
>
> From https://github.com/apache/arrow/pull/3469#discussion_r250862542
> We implement our custom find codes to find libraries by 
> {{find_library()}}/{{find_path()}} with {{NO_DEFAULT_PATH}}. So we need to 
> handle {{lib64/}} (on Red Hat) and {{lib/x86_64-linux-gnu/}} (on Debian) 
> paths manually.
> If we use the CMake's standard find features such as {{CMAKE_PREFIX_PATH}} 
> https://cmake.org/cmake/help/v3.13/variable/CMAKE_PREFIX_PATH.html#variable:CMAKE_PREFIX_PATH
>  , we can remove our custom find codes.
> CMake has package specific find path feature by {{ 3.12. It's equivalent of our {{ https://cmake.org/cmake/help/v3.12/command/find_library.html
> {quote}
> If called from within a find module loaded by 
> {{find_package()}}, search prefixes unique to the current 
> package being found. Specifically look in the {{_ROOT}} CMake 
> variable and the {{_ROOT}} environment variable. The package 
> root variables are maintained as a stack so if called from nested find 
> modules, root paths from the parent’s find module will be searched after 
> paths from the current module, i.e. {{_ROOT}}, 
> {{ENV\{_ROOT}}}, {{_ROOT}}, 
> {{ENV\{_ROOT}}}, etc.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4428) [R] Feature flags for R build

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4428:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [R] Feature flags for R build
> -
>
> Key: ARROW-4428
> URL: https://issues.apache.org/jira/browse/ARROW-4428
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> There are a number of optional components in the Arrow C++ library. In Python 
> we have feature flags to turn on and off parts of the bindings based on what 
> C++ libraries have been built. There is also some logic to try to detect what 
> has been built and enable those features.
> We need to have the same thing in R. Some components, like Plasma, are not 
> available for Windows and so necessarily these will have to be flagged off. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4383) [C++] Use the CMake's standard find features

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791012#comment-16791012
 ] 

Wes McKinney commented on ARROW-4383:
-

I suggest we address this more comprehensively after ARROW-4611 is in

> [C++] Use the CMake's standard find features
> 
>
> Key: ARROW-4383
> URL: https://issues.apache.org/jira/browse/ARROW-4383
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Priority: Major
> Fix For: 0.14.0
>
>
> From https://github.com/apache/arrow/pull/3469#discussion_r250862542
> We implement our custom find codes to find libraries by 
> {{find_library()}}/{{find_path()}} with {{NO_DEFAULT_PATH}}. So we need to 
> handle {{lib64/}} (on Red Hat) and {{lib/x86_64-linux-gnu/}} (on Debian) 
> paths manually.
> If we use the CMake's standard find features such as {{CMAKE_PREFIX_PATH}} 
> https://cmake.org/cmake/help/v3.13/variable/CMAKE_PREFIX_PATH.html#variable:CMAKE_PREFIX_PATH
>  , we can remove our custom find codes.
> CMake has package specific find path feature by {{ 3.12. It's equivalent of our {{ https://cmake.org/cmake/help/v3.12/command/find_library.html
> {quote}
> If called from within a find module loaded by 
> {{find_package()}}, search prefixes unique to the current 
> package being found. Specifically look in the {{_ROOT}} CMake 
> variable and the {{_ROOT}} environment variable. The package 
> root variables are maintained as a stack so if called from nested find 
> modules, root paths from the parent’s find module will be searched after 
> paths from the current module, i.e. {{_ROOT}}, 
> {{ENV\{_ROOT}}}, {{_ROOT}}, 
> {{ENV\{_ROOT}}}, etc.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-4844:
--

Assignee: Uwe L. Korn

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Assignee: Uwe L. Korn
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4427) Move Confluence Wiki pages to the Sphinx docs

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4427:

Fix Version/s: (was: 0.13.0)
   0.14.0

> Move Confluence Wiki pages to the Sphinx docs
> -
>
> Key: ARROW-4427
> URL: https://issues.apache.org/jira/browse/ARROW-4427
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
> Fix For: 0.14.0
>
>
> It's hard to find and modify the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  and other developers' wiki pages in Confluence. If these were moved to 
> inside the project web page, that would make it easier.
> There are 5 steps to this:
>  # Create a new directory inside of `arrow/docs/source` to house the wiki 
> pages. (It will look like the 
> [cpp|https://github.com/apache/arrow/tree/master/docs/source/cpp] or 
> [python|https://github.com/apache/arrow/tree/master/docs/source/python] 
> directories.)
>  # Copy the wiki page contents to new `*.rst` pages inside this new directory.
>  # Add an `index.rst` that links to them all with enough description to help 
> navigation.
>  # Modify the Sphinx index page 
> [`arrow/docs/source/index.rst`|https://github.com/apache/arrow/blob/master/docs/source/index.rst]
>  to have an entry that points to the new index page made in step 3
>  # Modify the static site page 
> [`arrow/site/_includes/header.html`|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33]
>  to point to the newly created page instead of the wiki page.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4495) [C++][Gandiva] TestCastTimestampErrors failed in gandiva-precompiled-time_test in MSVC

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4495:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++][Gandiva] TestCastTimestampErrors failed in 
> gandiva-precompiled-time_test in MSVC
> --
>
> Key: ARROW-4495
> URL: https://issues.apache.org/jira/browse/ARROW-4495
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> See discussion in https://github.com/apache/arrow/pull/3567. This test is 
> disabled for now



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4547) [Python][Documentation] Update python/development.rst with instructions for CUDA-enabled builds

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4547:
---

Assignee: Wes McKinney

> [Python][Documentation] Update python/development.rst with instructions for 
> CUDA-enabled builds
> ---
>
> Key: ARROW-4547
> URL: https://issues.apache.org/jira/browse/ARROW-4547
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Building a CUDA-enabled install is not documented



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4702) [C++] Upgrade dependency versions

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4702:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Upgrade dependency versions
> -
>
> Key: ARROW-4702
> URL: https://issues.apache.org/jira/browse/ARROW-4702
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> At some point we should probably update the versions of the third-party 
> libraries we depend on. There might be useful bug or security fixes there, or 
> performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4794) [Python] Make pandas an optional test dependency

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4794:
---

Assignee: Wes McKinney

> [Python] Make pandas an optional test dependency
> 
>
> Key: ARROW-4794
> URL: https://issues.apache.org/jira/browse/ARROW-4794
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Pandas currently is mandatory to run the python tests which makes it 
> impossible to properly run the test suite without pandas, however pandas is 
> an optional dependency of pyarrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3246) [Python] direct reading/writing of pandas categoricals in parquet

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791008#comment-16791008
 ] 

Wes McKinney commented on ARROW-3246:
-

I moved this to 0.14. A bit of work will be needed in order to be able to 
sidestep hashing to categorical. If we can read BYTE_ARRAY columns directly 
back as Categorical (but have to hash) that is a good first step. 

> [Python] direct reading/writing of pandas categoricals in parquet
> -
>
> Key: ARROW-3246
> URL: https://issues.apache.org/jira/browse/ARROW-3246
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
>
> Parquet supports "dictionary encoding" of column data in a manner very 
> similar to the concept of Categoricals in pandas. It is natural to use this 
> encoding for a column which originated as a categorical. Conversely, when 
> loading, if the file metadata says that a given column came from a pandas (or 
> arrow) categorical, then we can trust that the whole of the column is 
> dictionary-encoded and load the data directly into a categorical column, 
> rather than expanding the labels upon load and recategorising later.
> If the data does not have the pandas metadata, then the guarantee cannot 
> hold, and we cannot assume either that the whole column is dictionary encoded 
> or that the labels are the same throughout. In this case, the current 
> behaviour is fine.
>  
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3246) [Python] direct reading/writing of pandas categoricals in parquet

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3246:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] direct reading/writing of pandas categoricals in parquet
> -
>
> Key: ARROW-3246
> URL: https://issues.apache.org/jira/browse/ARROW-3246
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
>
> Parquet supports "dictionary encoding" of column data in a manner very 
> similar to the concept of Categoricals in pandas. It is natural to use this 
> encoding for a column which originated as a categorical. Conversely, when 
> loading, if the file metadata says that a given column came from a pandas (or 
> arrow) categorical, then we can trust that the whole of the column is 
> dictionary-encoded and load the data directly into a categorical column, 
> rather than expanding the labels upon load and recategorising later.
> If the data does not have the pandas metadata, then the guarantee cannot 
> hold, and we cannot assume either that the whole column is dictionary encoded 
> or that the labels are the same throughout. In this case, the current 
> behaviour is fine.
>  
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2501) [Java] Remove Jackson from compile-time dependencies for arrow-vector

2019-03-12 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790993#comment-16790993
 ] 

Jacques Nadeau commented on ARROW-2501:
---

I think the use of Jackson is limited to a very small piece of non-core 
functionality. We should move that functionality out of the vector module to a 
place where people can use it only if they need to (if we even need that 
functionality). Several core classes are labeled with Jackson but they 
shouldn't need to be (Field, Schema, DictionaryEncoding). JsonFileReader and 
Writer need this stuff but that should really be separate from vector since 
that isn't focused on a real use case from my perspective (or maybe even just 
move to tests). 

> [Java] Remove Jackson from compile-time dependencies for arrow-vector
> -
>
> Key: ARROW-2501
> URL: https://issues.apache.org/jira/browse/ARROW-2501
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.9.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I would like to upgrade Jackson to the latest version (2.9.5). If there are 
> no objections I will create a PR (it is literally just changing the version 
> number in the pom - no code changes required).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3619) [R] Expose global thread pool optins

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3619.
-
Resolution: Fixed
  Assignee: Romain François

> [R] Expose global thread pool optins
> 
>
> Key: ARROW-3619
> URL: https://issues.apache.org/jira/browse/ARROW-3619
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Romain François
>Priority: Major
> Fix For: 0.13.0
>
>
> This will permit users to configure multithreading options e.g. for 
> conversions.  See 
> https://github.com/apache/arrow/blob/master/python/pyarrow/lib.pyx#L40 in 
> Python



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791002#comment-16791002
 ] 

Wes McKinney commented on ARROW-3191:
-

I'm moving this to 0.14. This feature will need some time to harden, but the 
sooner it's implemented the better so we can get feedback from folks

> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>Priority: Major
> Fix For: 0.14.0
>
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {code}
> public abstract class Memory  {
>   protected final int length;
>   protected final long address;
>   protected abstract void release();
> }
> {code}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3191:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>Priority: Major
> Fix For: 0.14.0
>
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {code}
> public abstract class Memory  {
>   protected final int length;
>   protected final long address;
>   protected abstract void release();
> }
> {code}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2887) [Plasma] Methods in plasma/store.h returning PlasmaError should return Status instead

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2887:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Plasma] Methods in plasma/store.h returning PlasmaError should return Status 
> instead
> -
>
> Key: ARROW-2887
> URL: https://issues.apache.org/jira/browse/ARROW-2887
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> These functions are not able to return other kinds of errors (e.g. 
> CUDA-related errors) as a result of this. I encountered this while working on 
> ARROW-2883



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790999#comment-16790999
 ] 

Wes McKinney commented on ARROW-2119:
-

The patch I put up is full of failures. I'm doubtful this can be resolved in 
time for 0.13

> [C++][Java] Handle Arrow stream with zero record batch
> --
>
> Key: ARROW-2119
> URL: https://issues.apache.org/jira/browse/ARROW-2119
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java
>Reporter: Jingyuan Wang
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It looks like currently many places of the code assume that there needs to be 
> at least one record batch for streaming format. Is zero-recordbatch not 
> supported by design?
> e.g. 
> [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45]
> {code:none}
>   public static void convert(InputStream in, OutputStream out) throws 
> IOException {
> BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
> try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
>   VectorSchemaRoot root = reader.getVectorSchemaRoot();
>   // load the first batch before instantiating the writer so that we have 
> any dictionaries
>   if (!reader.loadNextBatch()) {
> throw new IOException("Unable to read first record batch");
>   }
>   ...
> {code}
> Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an 
> exception originated from 
> [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:]
> {code:none}
> Status Table::FromRecordBatches(const 
> std::vector>& batches,
> std::shared_ptr* table) {
>   if (batches.size() == 0) {
> return Status::Invalid("Must pass at least one record batch");
>   }
>   ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3710) [CI/Python] Run nightly tests against pandas master

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790998#comment-16790998
 ] 

Wes McKinney commented on ARROW-3710:
-

OK, I'm going to move this to 0.14, and we can resolve once we have a public 
dashboard of some kind

> [CI/Python] Run nightly tests against pandas master
> ---
>
> Key: ARROW-3710
> URL: https://issues.apache.org/jira/browse/ARROW-3710
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Follow-up of [https://github.com/apache/arrow/pull/2758] and 
> https://github.com/apache/arrow/pull/2755



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2769) [Python] Deprecate and rename add_metadata methods

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2769:

Component/s: Python

> [Python] Deprecate and rename add_metadata methods
> --
>
> Key: ARROW-2769
> URL: https://issues.apache.org/jira/browse/ARROW-2769
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Minor
> Fix For: 0.13.0
>
>
> Deprecate and replace `pyarrow.Field.add_metadata` (and other likely named 
> methods) with replace_metadata, set_metadata or with_metadata. Knowing 
> Spark's immutable API, I would have chosen with_metadata but I guess this is 
> probably not what the average Python user would expect as naming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3710) [CI/Python] Run nightly tests against pandas master

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3710:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [CI/Python] Run nightly tests against pandas master
> ---
>
> Key: ARROW-3710
> URL: https://issues.apache.org/jira/browse/ARROW-3710
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Follow-up of [https://github.com/apache/arrow/pull/2758] and 
> https://github.com/apache/arrow/pull/2755



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4566) [C++][Flight] Add option to run arrow-flight-benchmark against a perf server running on a different host

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4566:
---

Assignee: Wes McKinney

> [C++][Flight] Add option to run arrow-flight-benchmark against a perf server 
> running on a different host
> 
>
> Key: ARROW-4566
> URL: https://issues.apache.org/jira/browse/ARROW-4566
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently the assumption is that both processes are running on localhost. 
> While also interesting (to see how fast things can go taking network IO out 
> of the equation) it is not very realistic. It would be good to both establish 
> a baseline network IO benchmark between two hosts and then see how close a 
> Flight stream can get to that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4677) [Python] serialization does not consider ndarray endianness

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4677:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] serialization does not consider ndarray endianness
> ---
>
> Key: ARROW-4677
> URL: https://issues.apache.org/jira/browse/ARROW-4677
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.1
> Environment: * pyarrow 0.12.1
> * numpy 1.16.1
> * Python 3.7.0
> * Intel Core i7-7820HQ
> * (macOS 10.13.6)
>Reporter: Gabe Joseph
>Priority: Minor
> Fix For: 0.14.0
>
>
> {{pa.serialize}} does not appear to properly encode the endianness of 
> multi-byte data:
> {code}
> # roundtrip.py 
> import numpy as np
> import pyarrow as pa
> arr = np.array([1], dtype=np.dtype('>i2'))
> buf = pa.serialize(arr).to_buffer()
> result = pa.deserialize(buf)
> print(f"Original: {arr.dtype.str}, deserialized: {result.dtype.str}")
> np.testing.assert_array_equal(arr, result)
> {code}
> {code}
> $ pipenv run python roundtrip.py
> Original: >i2, deserialized:  Traceback (most recent call last):
>   File "roundtrip.py", line 10, in 
> np.testing.assert_array_equal(arr, result)
>   File 
> "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py",
>  line 896, in assert_array_equal
> verbose=verbose, header='Arrays are not equal')
>   File 
> "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py",
>  line 819, in assert_array_compare
> raise AssertionError(msg)
> AssertionError: 
> Arrays are not equal
> Mismatch: 100%
> Max absolute difference: 255
> Max relative difference: 0.99609375
>  x: array([1], dtype=int16)
>  y: array([256], dtype=int16)
> {code}
> The data of the deserialized array is identical (big-endian), but the dtype 
> Arrow assigns to it doesn't reflect its endianness (presumably uses the 
> system endianness, which is little).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4480) [Python] Drive letter removed when writing parquet file

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4480:

Priority: Blocker  (was: Major)

> [Python] Drive letter removed when writing parquet file 
> 
>
> Key: ARROW-4480
> URL: https://issues.apache.org/jira/browse/ARROW-4480
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Seb Fru
>Priority: Blocker
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Hi everyone,
>   
>  importing this from Github:
>   
>  I encountered a problem while working with pyarrow: I am working on Windows 
> 10. When I want to save a table using pq.write_table(tab, 
> r'E:\parquetfiles\file1.parquet'), I get the Error "No such file or 
> directory".
>   After searching a bit, i found out that the drive letter is getting removed 
> while parsing the where string, but I could not find a way solve my problem: 
> I can write the files on my C:\ drive without problems, but I am not able to 
> write a parquet file on another drive than C:.
>  Am I doing something wrong or is this just how it works? I would really 
> appreciate any help, because I just cannot fit my files on C: drive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4735) [Go] Benchmark strconv.Format vs. fmt.Sprintf for CSV writer

2019-03-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4735:
--
Labels: pull-request-available  (was: )

> [Go] Benchmark strconv.Format vs. fmt.Sprintf for CSV writer
> 
>
> Key: ARROW-4735
> URL: https://issues.apache.org/jira/browse/ARROW-4735
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Anson Qian
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>
> Need test out strconv.Format\{Bool,Float,Int,Uint} instead of fmt.Sprintf and 
> see if we can improve write performance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4808) [Java][Vector] Convenience methods for setting decimal vector

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4808:

Component/s: Java

> [Java][Vector] Convenience methods for setting decimal vector
> -
>
> Key: ARROW-4808
> URL: https://issues.apache.org/jira/browse/ARROW-4808
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Provide more convenience methods to set decimal vector, specifically
>  # Accept arrow buffers encode in little endian bytes that are of size < 16 
> bytes
>  # Accept arrow buffers that are encoded in big endian and could of size <=16 
> bytes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4690) [Python] Building TensorFlow compatible wheels for Arrow

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4690:

Component/s: Python

> [Python] Building TensorFlow compatible wheels for Arrow
> 
>
> Key: ARROW-4690
> URL: https://issues.apache.org/jira/browse/ARROW-4690
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Since the inclusion of LLVM, arrow wheels stopped working with TensorFlow 
> again (on some configurations at least).
> While we are continuing to discuss a more permanent solution in 
> [https://groups.google.com/a/tensorflow.org/d/topic/developers/TMqRaT-H2bI/discussion|https://groups.google.com/a/tensorflow.org/d/topic/developers/TMqRaT-H2bI/discussion,],
>  I made some progress in creating tensorflow compatible wheels for an 
> unmodified pyarrow.
> They won't adhere to the manylinux1 standard, but they should be as 
> compatible as the TensorFlow wheels because they use the same build 
> environment (ubuntu 14.04).
> I'll create a PR with the necessary changes. I don't propose to ship these 
> wheels but it might be a good idea to include the docker image and 
> instructions how to build them in the tree for organizations that want to use 
> tensorflow with pyarrow on top of pip. The official recommendation should 
> probably be to use conda if the average user wants to do this for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4323) [Packaging] Fix failing OSX clang conda forge builds

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4323:

Component/s: Python
 Packaging

> [Packaging] Fix failing OSX clang conda forge builds
> 
>
> Key: ARROW-4323
> URL: https://issues.apache.org/jira/browse/ARROW-4323
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Log: https://travis-ci.org/kszucs/crossbow/builds/482871537



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Jeroen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790990#comment-16790990
 ] 

Jeroen edited comment on ARROW-4844 at 3/12/19 8:46 PM:


Hmm I don't understand. How are bindings supposed to use `libarrow.a` if it 
depends on internal vendored libraries which are not shipped with the 
installation? There is no way to link to this?

It seems to me you either need to include the internal libdouble-conversion 
object files into `libarrow.a`, or alternatively ship libdouble-conversion.a 
along with libarrow.a in the installation and update arrow.pc to link with 
"-larrow -ldouble-conversion"

Shared libs (dll files) are not permitted in R so we need a static libarrow.


was (Author: jeroenooms):
Hmm I don't understand. How are bindings supposed to use `libarrow.a` if it 
depends on internal vendored libraries which are not shipped with the 
installation? There is no way to link to this?

It seems to me you either need to include the internal libdouble-conversion 
object files into `libarrow.a`, or alternatively ship libdouble-conversion.a 
along with libarrow.a in the installation and update arrow.pc to link with 
"-larrow -ldouble-conversion"

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1807) [JAVA] Reduce Heap Usage (Phase 3): consolidate buffers

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1807:

Component/s: Java

> [JAVA] Reduce Heap Usage (Phase 3): consolidate buffers
> ---
>
> Key: ARROW-1807
> URL: https://issues.apache.org/jira/browse/ARROW-1807
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Siddharth Teotia
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Consolidate buffers for reducing the volume of objects and heap usage
>  => single buffer for fixed width
> < validity + offsets> = single buffer for var width, list vector



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Jeroen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790990#comment-16790990
 ] 

Jeroen edited comment on ARROW-4844 at 3/12/19 8:52 PM:


Hmm I don't understand. How are bindings supposed to use `libarrow.a` if it 
depends on internal vendored libraries which are not shipped with the 
installation? There is no way to link to this?

It seems to me you either need to include the internal libdouble-conversion 
object files into `libarrow.a`, or alternatively ship libdouble-conversion.a 
along with libarrow.a in the installation (perhaps rename to 
libarrow_double-conversion.a) and update arrow.pc to link with "-larrow 
-ldouble-conversion"

Shared libs (dll files) are not permitted in R so we need a complete static 
build of libarrow.


was (Author: jeroenooms):
Hmm I don't understand. How are bindings supposed to use `libarrow.a` if it 
depends on internal vendored libraries which are not shipped with the 
installation? There is no way to link to this?

It seems to me you either need to include the internal libdouble-conversion 
object files into `libarrow.a`, or alternatively ship libdouble-conversion.a 
along with libarrow.a in the installation (perhaps rename to 
libarrow_double-conversion.a) and update arrow.pc to link with "-larrow 
-ldouble-conversion"

Shared libs (dll files) are not permitted in R so we need a static libarrow.

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Jeroen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790990#comment-16790990
 ] 

Jeroen edited comment on ARROW-4844 at 3/12/19 8:51 PM:


Hmm I don't understand. How are bindings supposed to use `libarrow.a` if it 
depends on internal vendored libraries which are not shipped with the 
installation? There is no way to link to this?

It seems to me you either need to include the internal libdouble-conversion 
object files into `libarrow.a`, or alternatively ship libdouble-conversion.a 
along with libarrow.a in the installation (perhaps rename to 
libarrow_double-conversion.a) and update arrow.pc to link with "-larrow 
-ldouble-conversion"

Shared libs (dll files) are not permitted in R so we need a static libarrow.


was (Author: jeroenooms):
Hmm I don't understand. How are bindings supposed to use `libarrow.a` if it 
depends on internal vendored libraries which are not shipped with the 
installation? There is no way to link to this?

It seems to me you either need to include the internal libdouble-conversion 
object files into `libarrow.a`, or alternatively ship libdouble-conversion.a 
along with libarrow.a in the installation and update arrow.pc to link with 
"-larrow -ldouble-conversion"

Shared libs (dll files) are not permitted in R so we need a static libarrow.

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4736) [Go] Optimize memory usage for CSV writer

2019-03-12 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet reassigned ARROW-4736:
--

Assignee: Sebastien Binet

> [Go] Optimize memory usage for CSV writer
> -
>
> Key: ARROW-4736
> URL: https://issues.apache.org/jira/browse/ARROW-4736
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Anson Qian
>Assignee: Sebastien Binet
>Priority: Major
>
> perhaps not for this PR, but, depending on the number of rows and cols this 
> record contains, this may be a very large allocation, and very big memory 
> chunk.
> it could be more interesting performance wise to write n rows instead of 
> everything in one big chunk.
> also, to reduce the memory pressure on the GC, we should probably (re)use 
> this slice-of-slice of strings.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-835) [Format] Add Timedelta type to describe time intervals

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-835:
---
Fix Version/s: (was: 0.13.0)
   0.14.0

> [Format] Add Timedelta type to describe time intervals
> --
>
> Key: ARROW-835
> URL: https://issues.apache.org/jira/browse/ARROW-835
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Jeff Reback
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: columnar-format-1.0
> Fix For: 0.14.0
>
>
> xref https://github.com/apache/arrow/pull/551 and 
> https://github.com/apache/arrow/pull/551#issuecomment-294325969
> this will allow round-tripping of pandas ``Timedelta`` and numpy 
> ``timedelt64[ns]`` types. The will have a similar TimeUnit to TimestampType 
> (s, us, ms, ns). Possible impl include making this pure 64-bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3150) [Python] Ship Flight-enabled Python wheels

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790991#comment-16790991
 ] 

Wes McKinney commented on ARROW-3150:
-

OK. I'll try to get Windows working this week

> [Python] Ship Flight-enabled Python wheels
> --
>
> Key: ARROW-3150
> URL: https://issues.apache.org/jira/browse/ARROW-3150
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> This may involve statically-linking (or bundling where shared libs makes 
> sense) the various required dependencies with {{libarrow_flight.so}} in the 
> manylinux1 wheel build



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-352) [Format] Interval(DAY_TIME) has no unit

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-352:
---
Fix Version/s: (was: 0.13.0)
   0.14.0

> [Format] Interval(DAY_TIME) has no unit
> ---
>
> Key: ARROW-352
> URL: https://issues.apache.org/jira/browse/ARROW-352
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Reporter: Julien Le Dem
>Assignee: Wes McKinney
>Priority: Major
>  Labels: columnar-format-1.0, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Interval(DATE_TIME) assumes milliseconds.
> we should have a time unit like timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4104) [Java] race in AllocationManager during release

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4104:

Component/s: Java

> [Java] race in AllocationManager during release
> ---
>
> Key: ARROW-4104
> URL: https://issues.apache.org/jira/browse/ARROW-4104
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: arrow, java, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This is caused due to a bug in my changes for ARROW-1807. The synchronization 
> is happening on the BufferLedger instance instead of the AllocationManager 
> instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4358) [Gandiva][Crossbow] Trusty build broken

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4358:

Component/s: Packaging
 C++ - Gandiva

> [Gandiva][Crossbow] Trusty build broken
> ---
>
> Key: ARROW-4358
> URL: https://issues.apache.org/jira/browse/ARROW-4358
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva, Packaging
>Reporter: Praveen Kumar Desabandu
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> As a side effect of 
> [https://github.com/apache/arrow/commit/1b8a7bc3baa4bce660c18a13934115d55f8733df,]
>  java builds on trusty are broken due to removal of travis maven in this 
> commit.
> This Jira is to support both environments..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4273) [Release] Fix verification script to use cf201901 conda-forge label

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4273:

Component/s: Developer Tools

> [Release] Fix verification script to use cf201901 conda-forge label
> ---
>
> Key: ARROW-4273
> URL: https://issues.apache.org/jira/browse/ARROW-4273
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4315) [Website] Home page of https://arrow.apache.org/ does not mention Go or Rust

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4315:

Component/s: Website

> [Website] Home page of https://arrow.apache.org/ does not mention Go or Rust
> 
>
> Key: ARROW-4315
> URL: https://issues.apache.org/jira/browse/ARROW-4315
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Home page of https://arrow.apache.org/ does not mention Go or Rust



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4381) [Docker] docker-compose build lint fails

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4381:

Component/s: Developer Tools

> [Docker] docker-compose build lint fails
> 
>
> Key: ARROW-4381
> URL: https://issues.apache.org/jira/browse/ARROW-4381
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> $ docker-compose build lint
> Building lint
> Step 1/4 : FROM arrow:python-3.6
> ERROR: Service 'lint' failed to build: pull access denied for arrow, 
> repository does not exist or may require 'docker login'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Jeroen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790990#comment-16790990
 ] 

Jeroen commented on ARROW-4844:
---

Hmm I don't understand. How are bindings supposed to use `libarrow.a` if it 
depends on internal vendored libraries which are not shipped with the 
installation? There is no way to link to this?

It seems to me you either need to include the internal libdouble-conversion 
object files into `libarrow.a`, or alternatively ship libdouble-conversion.a 
along with libarrow.a in the installation and update arrow.pc to link with 
"-larrow -ldouble-conversion"

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4837) [C++] Support c++filt on a custom path in the run-test.sh script

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4837:

Component/s: Developer Tools
 C++

> [C++] Support c++filt on a custom path in the run-test.sh script
> 
>
> Key: ARROW-4837
> URL: https://issues.apache.org/jira/browse/ARROW-4837
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> On conda this is CXXFILT=/opt/conda/bin/x86_64-conda_cos6-linux-gnu-c++filt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4360) [C++] Query homebrew for Thrift

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4360:

Component/s: C++

> [C++] Query homebrew for Thrift
> ---
>
> Key: ARROW-4360
> URL: https://issues.apache.org/jira/browse/ARROW-4360
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Also search for LLVM with homebrew when on OSX and THRIFT_HOME is not set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4693) [CI] Build boost library with multi precision

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4693:

Component/s: Python

> [CI] Build boost library with multi precision  
> ---
>
> Key: ARROW-4693
> URL: https://issues.apache.org/jira/browse/ARROW-4693
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Python
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is required for ARROW-4205.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4198) [Gandiva] Add support to cast timestamp

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4198:

Component/s: C++ - Gandiva

> [Gandiva] Add support to cast timestamp
> ---
>
> Key: ARROW-4198
> URL: https://issues.apache.org/jira/browse/ARROW-4198
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva
>Reporter: shyam narayan singh
>Assignee: shyam narayan singh
>Priority: Major
>  Labels: gandiva, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Add support to cast timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4693) [CI] Build boost library with multi precision

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4693:

Component/s: Continuous Integration

> [CI] Build boost library with multi precision  
> ---
>
> Key: ARROW-4693
> URL: https://issues.apache.org/jira/browse/ARROW-4693
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is required for ARROW-4205.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4267) [Python/C++][Parquet] Segfault when reading rowgroups with duplicated columns

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4267:

Summary: [Python/C++][Parquet] Segfault when reading rowgroups with 
duplicated columns  (was: [Python/C++] Segfault when reading rowgroups with 
duplicated columns)

> [Python/C++][Parquet] Segfault when reading rowgroups with duplicated columns
> -
>
> Key: ARROW-4267
> URL: https://issues.apache.org/jira/browse/ARROW-4267
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Florian Jetter
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading a row group using duplicated columns I receive a segfault.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({
> "col": ["A", "B"]
> })
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> parquet_file = pq.ParquetFile(buf.getvalue())
> parquet_file.read_row_group(0)
> parquet_file.read_row_group(0, columns=["col"])
> # boom
> parquet_file.read_row_group(0, columns=["col", "col"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4267) [Python/C++] Segfault when reading rowgroups with duplicated columns

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4267:

Component/s: C++

> [Python/C++] Segfault when reading rowgroups with duplicated columns
> 
>
> Key: ARROW-4267
> URL: https://issues.apache.org/jira/browse/ARROW-4267
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Florian Jetter
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading a row group using duplicated columns I receive a segfault.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({
> "col": ["A", "B"]
> })
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> parquet_file = pq.ParquetFile(buf.getvalue())
> parquet_file.read_row_group(0)
> parquet_file.read_row_group(0, columns=["col"])
> # boom
> parquet_file.read_row_group(0, columns=["col", "col"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4267) [Python/C++] Segfault when reading rowgroups with duplicated columns

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4267:

Labels: parquet pull-request-available  (was: pull-request-available)

> [Python/C++] Segfault when reading rowgroups with duplicated columns
> 
>
> Key: ARROW-4267
> URL: https://issues.apache.org/jira/browse/ARROW-4267
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Florian Jetter
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading a row group using duplicated columns I receive a segfault.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({
> "col": ["A", "B"]
> })
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> parquet_file = pq.ParquetFile(buf.getvalue())
> parquet_file.read_row_group(0)
> parquet_file.read_row_group(0, columns=["col"])
> # boom
> parquet_file.read_row_group(0, columns=["col", "col"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4024:

Component/s: Python

> [Python] Cython compilation error on cython==0.27.3
> ---
>
> Key: ARROW-4024
> URL: https://issues.apache.org/jira/browse/ARROW-4024
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> On the latest master, I'm getting the following error:
> {code:java}
> [ 11%] Compiling Cython CXX source for lib...
> Error compiling Cython file:
> 
> ...
>     out.init(type)
>     return out
> cdef object pyarrow_wrap_metadata(
>     ^
> 
> pyarrow/public-api.pxi:95:5: Function signature does not match previous 
> declaration
> CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' 
> failed{code}
> With 0.29.0 it is working. This might have been introduced in 
> [https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b]
>  but I'm not sure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790984#comment-16790984
 ] 

Uwe L. Korn commented on ARROW-4844:


[~jeroenooms] This is the expected behaviour. I guess what you need for the R 
bindings is an optional to bundle ALL static dependencies of {{libarrow.a}} 
into it? For the Python wheels, we built {{libarrow.so}} with all static 
dependencies included and then set the RPATH of the Python modules to 
{{$ORIGIN}} to always pick up the correct {{libarrow.so}}. If that would be 
possible for R, this would be a preferable solution from the build system side 
(and also from interop side as you then could use the same {{libarrow.so}} for 
Python and R).

> Static libarrow is missing vendored libdouble-conversion
> 
>
> Key: ARROW-4844
> URL: https://issues.apache.org/jira/browse/ARROW-4844
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Jeroen
>Priority: Major
>
> When trying to statically link the R bindings to libarrow.a, I get linking 
> errors which suggest that libdouble-conversion.a was not properly embedded in 
> libarrow.a. This problem happens on both MacOS and Windows.
> Here is the arrow build log: 
> https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7
> {code}
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
>  
> C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
>  undefined reference to 
> `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
> int*) const'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4401) [Python] Alpine dockerfile fails to build because pandas requires numpy as build dependency

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4401:

Component/s: Python

> [Python] Alpine dockerfile fails to build because pandas requires numpy as 
> build dependency
> ---
>
> Key: ARROW-4401
> URL: https://issues.apache.org/jira/browse/ARROW-4401
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> See failed crossbow task: 
> https://travis-ci.org/kszucs/crossbow/builds/484990267



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4320) [C++] Add tests for non-contiguous tensors

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4320:

Component/s: C++

> [C++] Add tests for non-contiguous tensors
> --
>
> Key: ARROW-4320
> URL: https://issues.apache.org/jira/browse/ARROW-4320
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> I would like to add some test cases for tensors with non-contiguous strides.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4695) [JS] Tests timing out on Travis

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4695:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [JS] Tests timing out on Travis
> ---
>
> Key: ARROW-4695
> URL: https://issues.apache.org/jira/browse/ARROW-4695
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Brian Hulette
>Priority: Major
>  Labels: ci-failure, travis-ci
> Fix For: 0.14.0
>
>
> Example build: https://travis-ci.org/apache/arrow/jobs/498967250
> JS tests sometimes fail with the following message:
> {noformat}
> > apache-arrow@ test /home/travis/build/apache/arrow/js
> > NODE_NO_WARNINGS=1 gulp test
> [22:14:01] Using gulpfile ~/build/apache/arrow/js/gulpfile.js
> [22:14:01] Starting 'test'...
> [22:14:01] Starting 'test:ts'...
> [22:14:49] Finished 'test:ts' after 47 s
> [22:14:49] Starting 'test:src'...
> [22:15:27] Finished 'test:src' after 38 s
> [22:15:27] Starting 'test:apache-arrow'...
> No output has been received in the last 10m0s, this potentially indicates a 
> stalled build or something wrong with the build itself.
> Check the details on how to adjust your build configuration on: 
> https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
> The build has been terminated
> {noformat}
> I thought maybe we were just running up against some time limit, but that 
> particular build was terminated at 22:25:27, exactly ten minutes after the 
> last output, at 22:15:27. So it does seem like the build is somehow stalling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4322) [CI] docker nightlies fails after conda-forge compiler migration

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4322:

Component/s: Python
 Continuous Integration

> [CI] docker nightlies fails after conda-forge compiler migration
> 
>
> Key: ARROW-4322
> URL: https://issues.apache.org/jira/browse/ARROW-4322
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-835) [Format] Add Timedelta type to describe time intervals

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-835:
---
Component/s: Format

> [Format] Add Timedelta type to describe time intervals
> --
>
> Key: ARROW-835
> URL: https://issues.apache.org/jira/browse/ARROW-835
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Jeff Reback
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: columnar-format-1.0
> Fix For: 0.13.0
>
>
> xref https://github.com/apache/arrow/pull/551 and 
> https://github.com/apache/arrow/pull/551#issuecomment-294325969
> this will allow round-tripping of pandas ``Timedelta`` and numpy 
> ``timedelt64[ns]`` types. The will have a similar TimeUnit to TimestampType 
> (s, us, ms, ns). Possible impl include making this pure 64-bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-835) [Format] Add Timedelta type to describe time intervals

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790988#comment-16790988
 ] 

Wes McKinney commented on ARROW-835:


Moving to 0.14. We can make a push to get this into ship shape for that release 
cycle

> [Format] Add Timedelta type to describe time intervals
> --
>
> Key: ARROW-835
> URL: https://issues.apache.org/jira/browse/ARROW-835
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Jeff Reback
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: columnar-format-1.0
> Fix For: 0.14.0
>
>
> xref https://github.com/apache/arrow/pull/551 and 
> https://github.com/apache/arrow/pull/551#issuecomment-294325969
> this will allow round-tripping of pandas ``Timedelta`` and numpy 
> ``timedelt64[ns]`` types. The will have a similar TimeUnit to TimestampType 
> (s, us, ms, ns). Possible impl include making this pure 64-bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2860) [Python] Null values in a single partition of Parquet dataset, results in invalid schema on read

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2860:

Component/s: Python

> [Python] Null values in a single partition of Parquet dataset, results in 
> invalid schema on read
> 
>
> Key: ARROW-2860
> URL: https://issues.apache.org/jira/browse/ARROW-2860
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Sam Oluwalana
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> from datetime import datetime, timedelta
> def generate_data(event_type, event_id, offset=0):
> """Generate data."""
> now = datetime.utcnow() + timedelta(seconds=offset)
> obj = {
> 'event_type': event_type,
> 'event_id': event_id,
> 'event_date': now.date(),
> 'foo': None,
> 'bar': u'hello',
> }
> if event_type == 2:
> obj['foo'] = 1
> obj['bar'] = u'world'
> if event_type == 3:
> obj['different'] = u'data'
> obj['bar'] = u'event type 3'
> else:
> obj['different'] = None
> return obj
> data = [
> generate_data(1, 1, 1),
> generate_data(1, 1, 3600 * 72),
> generate_data(2, 1, 1),
> generate_data(2, 1, 3600 * 72),
> generate_data(3, 1, 1),
> generate_data(3, 1, 3600 * 72),
> ]
> df = pd.DataFrame.from_records(data, index='event_id')
> table = pa.Table.from_pandas(df)
> pq.write_to_dataset(table, root_path='/tmp/events', 
> partition_cols=['event_type', 'event_date'])
> dataset = pq.ParquetDataset('/tmp/events')
> table = dataset.read()
> print(table.num_rows)
> {code}
> Expected output:
> {code:python}
> 6
> {code}
> Actual:
> {code:python}
> python example_failure.py
> Traceback (most recent call last):
>   File "example_failure.py", line 43, in 
> dataset = pq.ParquetDataset('/tmp/events')
>   File 
> "/Users/sam/.virtualenvs/test-parquet/lib/python2.7/site-packages/pyarrow/parquet.py",
>  line 745, in __init__
> self.validate_schemas()
>   File 
> "/Users/sam/.virtualenvs/test-parquet/lib/python2.7/site-packages/pyarrow/parquet.py",
>  line 775, in validate_schemas
> dataset_schema))
> ValueError: Schema in partition[event_type=2, event_date=0] 
> /tmp/events/event_type=3/event_date=2018-07-16 
> 00:00:00/be001bf576674d09825539f20e99ebe5.parquet was different.
> bar: string
> different: string
> foo: double
> event_id: int64
> metadata
> 
> {'pandas': '{"pandas_version": "0.23.3", "index_columns": ["event_id"], 
> "columns": [{"metadata": null, "field_name": "bar", "name": "bar", 
> "numpy_type": "object", "pandas_type": "unicode"}, {"metadata": null, 
> "field_name": "different", "name": "different", "numpy_type": "object", 
> "pandas_type": "unicode"}, {"metadata": null, "field_name": "foo", "name": 
> "foo", "numpy_type": "float64", "pandas_type": "float64"}, {"metadata": null, 
> "field_name": "event_id", "name": "event_id", "numpy_type": "int64", 
> "pandas_type": "int64"}], "column_indexes": [{"metadata": null, "field_name": 
> null, "name": null, "numpy_type": "object", "pandas_type": "bytes"}]}'}
> vs
> bar: string
> different: null
> foo: double
> event_id: int64
> metadata
> 
> {'pandas': '{"pandas_version": "0.23.3", "index_columns": ["event_id"], 
> "columns": [{"metadata": null, "field_name": "bar", "name": "bar", 
> "numpy_type": "object", "pandas_type": "unicode"}, {"metadata": null, 
> "field_name": "different", "name": "different", "numpy_type": "object", 
> "pandas_type": "empty"}, {"metadata": null, "field_name": "foo", "name": 
> "foo", "numpy_type": "float64", "pandas_type": "float64"}, {"metadata": null, 
> "field_name": "event_id", "name": "event_id", "numpy_type": "int64", 
> "pandas_type": "int64"}], "column_indexes": [{"metadata": null, "field_name": 
> null, "name": null, "numpy_type": "object", "pandas_type": "bytes"}]}'}
> {code}
> Apparently what is happening is that pyarrow is interpreting the schema from 
> each of the partitions individually and the partitions for `event_type=3 / 
> event_date=*`  both have values for the column `different` whereas the other 
> columns do not. The discrepancy causes the `None` values of the other 
> partitions to be labeled as `pandas_type` `empty` instead of `unicode`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3973) [Gandiva][Java] Move the benchmark tests out of unit test scope.

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3973:

Component/s: C++ - Gandiva

> [Gandiva][Java] Move the benchmark tests out of unit test scope.
> 
>
> Key: ARROW-3973
> URL: https://issues.apache.org/jira/browse/ARROW-3973
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3973) [Gandiva][Java] Move the benchmark tests out of unit test scope.

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3973:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Gandiva][Java] Move the benchmark tests out of unit test scope.
> 
>
> Key: ARROW-3973
> URL: https://issues.apache.org/jira/browse/ARROW-3973
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4020) [Release] Remove source and binary artifacts from dev dist system after release vote passes

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4020:

Component/s: Developer Tools

> [Release] Remove source and binary artifacts from dev dist system after 
> release vote passes
> ---
>
> Key: ARROW-4020
> URL: https://issues.apache.org/jira/browse/ARROW-4020
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> We have accumulated a lot of artifacts in 
> https://dist.apache.org/repos/dist/dev/arrow. This is not scalable. I'm going 
> to remove the old RC's for now, but this should be part of post-release 
> scripts or release manager duties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3503) [Python] Allow config hadoop_bin in pyarrow hdfs.py

2019-03-12 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3503:

Component/s: Python

> [Python] Allow config hadoop_bin in pyarrow hdfs.py 
> 
>
> Key: ARROW-3503
> URL: https://issues.apache.org/jira/browse/ARROW-3503
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wenbo Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, the hadoop_bin is either from `HADOOP_HOME` or the `hadoop` 
> command. 
> [https://github.com/apache/arrow/blob/master/python/pyarrow/hdfs.py#L130]
> However, in some of environment setup, hadoop_bin could be some other 
> location. Can we do something like 
>  
> {code:java}
> if 'HADOOP_BIN' in os.environ:
>     hadoop_bin = os.environ['HADOOP_BIN']
> elif 'HADOOP_HOME' in os.environ:
>     hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
> else:
>     hadoop_bin = 'hadoop'
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4736) [Go] Optimize memory usage for CSV writer

2019-03-12 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet reassigned ARROW-4736:
--

Assignee: (was: Sebastien Binet)

> [Go] Optimize memory usage for CSV writer
> -
>
> Key: ARROW-4736
> URL: https://issues.apache.org/jira/browse/ARROW-4736
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Anson Qian
>Priority: Major
>
> perhaps not for this PR, but, depending on the number of rows and cols this 
> record contains, this may be a very large allocation, and very big memory 
> chunk.
> it could be more interesting performance wise to write n rows instead of 
> everything in one big chunk.
> also, to reduce the memory pressure on the GC, we should probably (re)use 
> this slice-of-slice of strings.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >