[jira] [Commented] (ARROW-596) [Python] Add convenience function to convert pandas.DataFrame to pyarrow.Buffer containing a file or stream representation

2017-03-13 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907023#comment-15907023
 ] 

Antoine Pitrou commented on ARROW-596:
--

Cython allows you to implement the buffer protocol: see 
https://cython.readthedocs.io/en/latest/src/userguide/buffer.html . I've never 
used it but it looks similar to what you would do in C.

Note that pyarrow.Buffer needs to be a fixed-size buffer for that operation to 
make sense. If not, then __getbuffer__ should lock the buffer size until 
__releasebuffer__ is called.

> [Python] Add convenience function to convert pandas.DataFrame to 
> pyarrow.Buffer containing a file or stream representation
> --
>
> Key: ARROW-596
> URL: https://issues.apache.org/jira/browse/ARROW-596
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-2544) [CI] Run C++ tests with two jobs on Travis-CI

2018-05-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2544:
-

 Summary: [CI] Run C++ tests with two jobs on Travis-CI
 Key: ARROW-2544
 URL: https://issues.apache.org/jira/browse/ARROW-2544
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Antoine Pitrou
Assignee: Omer Katz


See https://github.com/apache/arrow/pull/1899



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2545) [Python] Arrow fails linking against statically-compiled Python

2018-05-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2545:
-

 Summary: [Python] Arrow fails linking against statically-compiled 
Python
 Key: ARROW-2545
 URL: https://issues.apache.org/jira/browse/ARROW-2545
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


See 
https://issues.apache.org/jira/browse/ARROW-1661?focusedCommentId=16462745=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16462745
 : to link statically against {{libpythonXX.a}}, you need to add in some system 
libraries such as {{libutil}}. Otherwise some symbols end up unresolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2546) [CI] Intermittent npm failures

2018-05-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2546:
-

 Summary: [CI] Intermittent npm failures
 Key: ARROW-2546
 URL: https://issues.apache.org/jira/browse/ARROW-2546
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, JavaScript
Reporter: Antoine Pitrou


See for example https://travis-ci.org/apache/arrow/jobs/375891278 .

{code}
npm WARN deprecated gulp-util@3.0.8: gulp-util is deprecated - replace it, 
following the guidelines at https://medium.com/gulpjs/gulp-util-ca3b1f9f9ac5
npm WARN deprecated standard-format@1.6.10: standard-format is deprecated in 
favor of a built-in autofixer in 'standard'. Usage: standard --fix
npm WARN deprecated minimatch@2.0.10: Please update to minimatch 3.0.2 or 
higher to avoid a RegExp DoS issue
npm WARN tar ENOENT: no such file or directory, open 
'/home/travis/build/apache/arrow/js/node_modules/.staging/google-closure-compiler-2d7bab98/contrib/externs/maps/google_maps_api_v3_23.js'
npm WARN ajv-keywords@3.2.0 requires a peer of ajv@^6.0.0 but none is 
installed. You must install peer dependencies yourself.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.2.3 
(node_modules/fsevents):
npm WARN enoent SKIPPING OPTIONAL DEPENDENCY: ENOENT: no such file or 
directory, rename 
'/home/travis/build/apache/arrow/js/node_modules/.staging/fsevents-5f35bbaf/node_modules/abbrev'
 -> '/home/travis/build/apache/arrow/js/node_modules/.staging/abbrev-e214f964'
npm ERR! code EINTEGRITY
npm ERR! 
sha512-bqB1yS6o9TNA9ZC/MJxM0FZzPnZdtHj0xWK/IZ5khzVqdpGul/R/EIiHRgFXlwTD7PSIaYVnGKq1QgMCu2mnqw==
 integrity checksum failed when using sha512: wanted 
sha512-bqB1yS6o9TNA9ZC/MJxM0FZzPnZdtHj0xWK/IZ5khzVqdpGul/R/EIiHRgFXlwTD7PSIaYVnGKq1QgMCu2mnqw==
 but got 
sha512-kgTmj+eAwkxGNzcVy5l66pJ3Exmxgj4IdQQ5fK53JTbfThLZFQybsk64V8pq2MMKXcqkkU6/0gGHXKbURv065w==.
 (4688848 bytes)
npm ERR! A complete log of this run can be found in:
npm ERR! /home/travis/.npm/_logs/2018-05-07T13_34_45_558Z-debug.log
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2563) [Rust] Poor caching in Travis-CI

2018-05-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2563:
-

 Summary: [Rust] Poor caching in Travis-CI
 Key: ARROW-2563
 URL: https://issues.apache.org/jira/browse/ARROW-2563
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Rust
Reporter: Antoine Pitrou


Since the Rust project isn't at the repo root, Travis-CI won't compiled cache 
artifacts by default. This leads to long CI times as all packages get 
recompiled (see https://docs.travis-ci.com/user/caching/#Rust-Cargo-cache for 
what gets cached).

In https://travis-ci.org/pitrou/arrow/jobs/376859806 I tried the following:
{code}
export CARGO_TARGET_DIR=$TRAVIS_BUILD_DIR/target
{code}

and after a first run, the build time went down to 2 minutes (from 15-18 
minutes).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2562) [C++] Upload coverage data to codecov.io

2018-05-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2562:
-

 Summary: [C++] Upload coverage data to codecov.io
 Key: ARROW-2562
 URL: https://issues.apache.org/jira/browse/ARROW-2562
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


ARROW-27 (upload coverage data to coveralls.io) has failed moving forward. We 
can try codecov.io instead, another free code coverage hosting service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2561) [C++] Crash in cuda-test shutdown with coverage enabled

2018-05-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2561:
-

 Summary: [C++] Crash in cuda-test shutdown with coverage enabled
 Key: ARROW-2561
 URL: https://issues.apache.org/jira/browse/ARROW-2561
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, GPU
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


If I enable both CUDA and code coverage (using 
{{-DARROW_GENERATE_COVERAGE=on}}), {{cuda-test}} sometimes crashes at shutdown 
with the following message:

{code}
*** Error in `./build-test/debug/cuda-test': corrupted size vs. prev_size: 
0x01612bb0 ***
=== Backtrace: =
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fc3d61e47e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7e9dc)[0x7fc3d61eb9dc]
/lib/x86_64-linux-gnu/libc.so.6(+0x81cde)[0x7fc3d61eecde]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7fc3d61f1184]
/home/antoine/arrow/cpp/build-test/debug/libarrow.so.10(+0x9350f3)[0x7fc3d5a510f3]
/lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0x9a)[0x7fc3d61a736a]
/home/antoine/arrow/cpp/build-test/debug/libarrow.so.10(+0x3415e3)[0x7fc3d545d5e3]
{code}

(the CUDA tests themselves pass)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2566) [CI] Add codecov.io badge to README

2018-05-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2566:
-

 Summary: [CI] Add codecov.io badge to README
 Key: ARROW-2566
 URL: https://issues.apache.org/jira/browse/ARROW-2566
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration
Affects Versions: 0.9.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2568) [Python] Expose thread pool size setting to Python, and deprecate "nthreads"

2018-05-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2568:
-

 Summary: [Python] Expose thread pool size setting to Python, and 
deprecate "nthreads"
 Key: ARROW-2568
 URL: https://issues.apache.org/jira/browse/ARROW-2568
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Now that we have a global thread pool, we should:
 * use it in places where we currently require an explicit number of threads 
(with an additional {{use_threads}} argument to enable parallelism)
 * deprecate the now pointless {{nthreads}} argument
 * expose the thread pool capacity setting in Python



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2574) [CI] Collect and publish Python coverage

2018-05-11 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2574:
-

 Summary: [CI] Collect and publish Python coverage
 Key: ARROW-2574
 URL: https://issues.apache.org/jira/browse/ARROW-2574
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Now that our Travis-CI setup is able to collect and publish C++ and Rust 
coverage, we should do the same for Python and Cython modules in pyarrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2588) [Plasma] Random unique ids always use the same seed

2018-05-16 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2588:
-

 Summary: [Plasma] Random unique ids always use the same seed
 Key: ARROW-2588
 URL: https://issues.apache.org/jira/browse/ARROW-2588
 Project: Apache Arrow
  Issue Type: Bug
  Components: Plasma (C++)
Reporter: Antoine Pitrou


Following GitHub PR #2039 (resolution to ARROW-2578), the random generator for 
random object ids is now using a constant default seed, meaning all processes 
will generate the same sequence of random ids:
{code:java}
$ python -c "from pyarrow import plasma; print(plasma.ObjectID.from_random())"
ObjectID(d022e7d520f8e938a14e188c47308cfef5fff7f7)
$ python -c "from pyarrow import plasma; print(plasma.ObjectID.from_random())"
ObjectID(d022e7d520f8e938a14e188c47308cfef5fff7f7)
{code}
As a sidenote, the plasma test suite should ideally test for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2589) [Python] test_parquet.py regression with Pandas 0.23.0

2018-05-16 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2589:
-

 Summary: [Python] test_parquet.py regression with Pandas 0.23.0
 Key: ARROW-2589
 URL: https://issues.apache.org/jira/browse/ARROW-2589
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


See e.g. https://travis-ci.org/apache/arrow/jobs/379652352#L3124.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2642) [Python] Fail building parquet binding on Windows

2018-05-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2642:
-

 Summary: [Python] Fail building parquet binding on Windows
 Key: ARROW-2642
 URL: https://issues.apache.org/jira/browse/ARROW-2642
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


For some reason I get the following error. I'm not sure why Thrift is needed 
here:

{code}
-- Found the Parquet library: C:/Miniconda3/envs/arrow/Library/lib/parquet.lib
-- THRIFT_HOME:
-- Thrift compiler/libraries NOT found:  (THRIFT_INCLUDE_DIR-NOTFOUND, THRIFT_ST
ATIC_LIB-NOTFOUND). Looked in system search paths.
-- Boost version: 1.66.0
-- Found the following Boost libraries:
--   regex
Added static library dependency boost_regex: C:/Miniconda3/envs/arrow/Library/li
b/libboost_regex.lib
Added static library dependency parquet: C:/Miniconda3/envs/arrow/Library/lib/pa
rquet_static.lib
CMake Error at C:/t/arrow/cpp/cmake_modules/BuildUtils.cmake:88 (message):
  No static or shared library provided for thrift
Call Stack (most recent call first):
  CMakeLists.txt:376 (ADD_THIRDPARTY_LIB)

{code}

The {{thrift-cpp}} package from conda-forge is installed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2643) [C++] Travis-CI build failure with cpp toolchain enabled

2018-05-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2643:
-

 Summary: [C++] Travis-CI build failure with cpp toolchain enabled
 Key: ARROW-2643
 URL: https://issues.apache.org/jira/browse/ARROW-2643
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


This is a new failure, perhaps triggered by a conda-forge package update. See 
example at https://travis-ci.org/apache/arrow/jobs/385002355#L2235

{code}
/usr/bin/ld: 
/home/travis/build/apache/arrow/cpp-toolchain/lib/libz.a(deflate.o): relocation 
R_X86_64_32S against `zcalloc' can not be used when making a shared object; 
recompile with -fPIC
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2660) [Python] Experiment with zero-copy pickling

2018-06-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2660:
-

 Summary: [Python] Experiment with zero-copy pickling
 Key: ARROW-2660
 URL: https://issues.apache.org/jira/browse/ARROW-2660
 Project: Apache Arrow
  Issue Type: Wish
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


PEP 574 has an implementation ready and a PyPI-available backport (at 
[https://pypi.org/project/pickle5/] ). Adding experimental support for it would 
allow for zero-copy pickling of Arrow arrays, columns, etc.

I think it mainly involves implementing {{reduce_ex}} on the {{Buffer}} class, 
as described in [https://www.python.org/dev/peps/pep-0574/#producer-api]

In addition, the consumer API added by PEP 574 could be used in Arrow's 
serialization array, to avoid or minimize copies when serializing foreign 
objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2644) [Python] parquet binding fails building on AppVeyor

2018-05-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2644:
-

 Summary: [Python] parquet binding fails building on AppVeyor
 Key: ARROW-2644
 URL: https://issues.apache.org/jira/browse/ARROW-2644
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This is a new issue (perhaps due to a new Cython version). See e.g. 
https://ci.appveyor.com/project/pitrou/arrow/build/1.0.509/job/dxdqcdk30kmiy6pd#L4291

Excerpt:

{code}
-- Running cmake --build for pyarrow
C:\Program Files (x86)\CMake\bin\cmake.exe --build . --config release
[1/8] cmd.exe /C "cd /D 
C:\projects\arrow\python\build\temp.win-amd64-3.6\Release && 
C:\Miniconda36-x64\envs\arrow\python.exe -m cython --cplus --working 
C:/projects/arrow/python --output-file 
C:/projects/arrow/python/build/temp.win-amd64-3.6/Release/_parquet.cpp 
C:/projects/arrow/python/pyarrow/_parquet.pyx"
[2/8] cmd.exe /c
[3/8] cmd.exe /C "cd /D 
C:\projects\arrow\python\build\temp.win-amd64-3.6\Release && 
C:\Miniconda36-x64\envs\arrow\python.exe -m cython --cplus --working 
C:/projects/arrow/python --output-file 
C:/projects/arrow/python/build/temp.win-amd64-3.6/Release/lib.cpp 
C:/projects/arrow/python/pyarrow/lib.pyx"
[4/8] cmd.exe /c
[5/8] 
C:\PROGRA~2\MIB055~1\2017\COMMUN~1\VC\Tools\MSVC\1414~1.264\bin\Hostx64\x64\cl.exe
   /TP -DARROW_EXPORTING -D_CRT_SECURE_NO_WARNINGS -D_parquet_EXPORTS 
-IC:\Miniconda36-x64\envs\arrow\lib\site-packages\numpy\core\include 
-IC:\Miniconda36-x64\envs\arrow\include -I..\..\..\src 
-IC:\Miniconda36-x64\envs\arrow\Library\include /bigobj /W3 /wd4800 /DWIN32 
/D_WINDOWS  /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /WX /wd4190 
/wd4293 /wd4800 /MD /O2 /Ob2 /DNDEBUG /showIncludes 
/FoCMakeFiles\_parquet.dir\_parquet.cpp.obj /FdCMakeFiles\_parquet.dir\ /FS -c 
_parquet.cpp
FAILED: CMakeFiles/_parquet.dir/_parquet.cpp.obj 
C:\PROGRA~2\MIB055~1\2017\COMMUN~1\VC\Tools\MSVC\1414~1.264\bin\Hostx64\x64\cl.exe
   /TP -DARROW_EXPORTING -D_CRT_SECURE_NO_WARNINGS -D_parquet_EXPORTS 
-IC:\Miniconda36-x64\envs\arrow\lib\site-packages\numpy\core\include 
-IC:\Miniconda36-x64\envs\arrow\include -I..\..\..\src 
-IC:\Miniconda36-x64\envs\arrow\Library\include /bigobj /W3 /wd4800 /DWIN32 
/D_WINDOWS  /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /WX /wd4190 
/wd4293 /wd4800 /MD /O2 /Ob2 /DNDEBUG /showIncludes 
/FoCMakeFiles\_parquet.dir\_parquet.cpp.obj /FdCMakeFiles\_parquet.dir\ /FS -c 
_parquet.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.14.26428.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.
_parquet.cpp(6790): error C2220: warning treated as error - no 'object' file 
generated
_parquet.cpp(6790): warning C4244: 'argument': conversion from 'int64_t' to 
'long', possible loss of data
[6/8] 
C:\PROGRA~2\MIB055~1\2017\COMMUN~1\VC\Tools\MSVC\1414~1.264\bin\Hostx64\x64\cl.exe
   /TP -DARROW_EXPORTING -D_CRT_SECURE_NO_WARNINGS -Dlib_EXPORTS 
-IC:\Miniconda36-x64\envs\arrow\lib\site-packages\numpy\core\include 
-IC:\Miniconda36-x64\envs\arrow\include -I..\..\..\src 
-IC:\Miniconda36-x64\envs\arrow\Library\include /bigobj /W3 /wd4800 /DWIN32 
/D_WINDOWS  /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /WX /wd4190 
/wd4293 /wd4800 /MD /O2 /Ob2 /DNDEBUG /showIncludes 
/FoCMakeFiles\lib.dir\lib.cpp.obj /FdCMakeFiles\lib.dir\ /FS -c lib.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.14.26428.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.
ninja: build stopped: subcommand failed.
error: command 'C:\\Program Files (x86)\\CMake\\bin\\cmake.exe' failed with 
exit status 1
(arrow) C:\projects\arrow\python>set lastexitcode=1 
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2641) [C++] Investigate spurious memset() calls

2018-05-28 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2641:
-

 Summary: [C++] Investigate spurious memset() calls
 Key: ARROW-2641
 URL: https://issues.apache.org/jira/browse/ARROW-2641
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


{{builder.cc}} has TODO statements of the form:

{code:c++}
  // TODO(emkornfield) valgrind complains without this
  memset(data_->mutable_data(), 0, static_cast(nbytes));
{code}

Ideally we shouldn't have to zero-initialize a data buffer before writing to it.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2653) [C++] Refactor hash table support

2018-05-31 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2653:
-

 Summary: [C++] Refactor hash table support
 Key: ARROW-2653
 URL: https://issues.apache.org/jira/browse/ARROW-2653
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Currently our hash table support is scattered in several places:
 * {{compute/kernels/hash.cc}}
 * {{util/hash.h}} and {{util/hash.cc}}
 * {{builder.cc}} (in the DictionaryBuilder implementation)

Perhaps we should have something like a type-parametered hash table class 
(perhaps backed by non-owned memory) with several primitives:
 * decide allocation size for a given number of items
 * lookup an item
 * insert an item
 * decide whether resizing is needed
 * resize to a new memory area
 * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2740) [Python] Add address property to Buffer

2018-06-25 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2740:
-

 Summary: [Python] Add address property to Buffer
 Key: ARROW-2740
 URL: https://issues.apache.org/jira/browse/ARROW-2740
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This would allow getting the start address of the buffer's data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2785) [C++] Crash in json-integration-test

2018-07-02 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2785:
-

 Summary: [C++] Crash in json-integration-test
 Key: ARROW-2785
 URL: https://issues.apache.org/jira/browse/ARROW-2785
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


This is probably something I keep getting wrong when creating a new 
environment, but after creating a Python 3.7 conda environment and installing 
the tool chain, I get the following crash (apparently boost-related):

{code}
$ ./build-test/debug/json-integration-test 
[==] Running 2 tests from 1 test case.
[--] Global test environment set-up.
[--] 2 tests from TestJSONIntegration
[ RUN  ] TestJSONIntegration.ConvertAndValidate
*** Error in `./build-test/debug/json-integration-test': munmap_chunk(): 
invalid pointer: 0x7ffc22542578 ***
=== Backtrace: =
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f4762f257e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7f4762f32698]
/home/antoine/miniconda3/envs/pyarrow37/lib/libstdc++.so.6(_ZNSsD1Ev+0x15)[0x7f476384cca5]
./build-test/debug/json-integration-test(_ZN5boost10filesystem4pathD1Ev+0x18)[0x694f4a]
./build-test/debug/json-integration-test[0x69205a]
./build-test/debug/json-integration-test(_ZN5arrow3ipc19TestJSONIntegration7mkstempEv+0x2c)[0x69599e]
./build-test/debug/json-integration-test(_ZN5arrow3ipc43TestJSONIntegration_ConvertAndValidate_Test8TestBodyEv+0x3b)[0x69210f]
./build-test/debug/json-integration-test(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x65)[0x8759da]
./build-test/debug/json-integration-test(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x5a)[0x86f65d]
./build-test/debug/json-integration-test(_ZN7testing4Test3RunEv+0xd5)[0x853697]
./build-test/debug/json-integration-test(_ZN7testing8TestInfo3RunEv+0x105)[0x853fef]
./build-test/debug/json-integration-test(_ZN7testing8TestCase3RunEv+0xf4)[0x8546f8]
./build-test/debug/json-integration-test(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x2ac)[0x85b666]
./build-test/debug/json-integration-test(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x65)[0x876eb7]
./build-test/debug/json-integration-test(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x5a)[0x870327]
./build-test/debug/json-integration-test(_ZN7testing8UnitTest3RunEv+0xc6)[0x85a128]
./build-test/debug/json-integration-test(_Z13RUN_ALL_TESTSv+0x11)[0x6945e6]
./build-test/debug/json-integration-test(main+0xfb)[0x693a2b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f4762ece830]
./build-test/debug/json-integration-test(_start+0x29)[0x68b4a9]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2534) [C++] libarrow.so leaks zlib symbols

2018-05-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2534:
-

 Summary: [C++] libarrow.so leaks zlib symbols
 Key: ARROW-2534
 URL: https://issues.apache.org/jira/browse/ARROW-2534
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


I get the following here:

{code:bash}
$ nm -D -C /home/antoine/miniconda3/envs/pyarrow/lib/libarrow.so.0.0.0 | \grep 
' T ' | \grep -v arrow
0025bc8c T adler32_z
0025c4c9 T crc32_z
002ad638 T _fini
00078ab8 T _init
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2514) [Python] Inferring / converting nested Numpy array is very slow

2018-04-26 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2514:
-

 Summary: [Python] Inferring / converting nested Numpy array is 
very slow
 Key: ARROW-2514
 URL: https://issues.apache.org/jira/browse/ARROW-2514
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Converting a nested Numpy array nested walks over the Numpy data as Python 
objects, even if the dtype is not "object". This makes it pointlessly slow 
compared to the non-nested case, and even the nested Python list case:

{code:python}
>>> %%timeit data = list(range(1))
...:pa.array(data)
...:
746 µs ± 8.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %%timeit data = np.arange(1)
...:pa.array(data)
...:
81.1 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1 loops each)
>>> %%timeit data = [np.arange(1)]
...:pa.array(data)
...:
3.39 ms ± 6.27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2532) [C++] Add chunked builder classes

2018-05-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2532:
-

 Summary: [C++] Add chunked builder classes
 Key: ARROW-2532
 URL: https://issues.apache.org/jira/browse/ARROW-2532
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


I think it would be useful to have chunked builders for list, string and binary 
types. A chunked builder would produce a chunked array as output, circumventing 
the 32-bit offset limit of those types. There's some special-casing scatterred 
around our Numpy conversion routines right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2522) [C++] Version shared library files

2018-04-28 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2522:
-

 Summary: [C++] Version shared library files
 Key: ARROW-2522
 URL: https://issues.apache.org/jira/browse/ARROW-2522
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


We should version installed shared library files (SO under Unix, DLL under 
Windows) to disambiguate incompatible ABI versions.

CMake provides support for that:
http://pusling.com/blog/?p=352
https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2033) pa.array() doesn't work with iterators

2018-01-25 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2033:
-

 Summary: pa.array() doesn't work with iterators
 Key: ARROW-2033
 URL: https://issues.apache.org/jira/browse/ARROW-2033
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


pa.array handles iterables fine, but not iterators if size isn't passed:
{code:java}
>>> arr = pa.array(range(5))
>>> arr

[
  0,
  1,
  2,
  3,
  4
]
>>> arr = pa.array(iter(range(5)))
>>> arr

[
  NA,
  NA,
  NA,
  NA,
  NA
]
{code}

This is because InferArrowSize() first exhausts the iterator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2054) Compilation warnings

2018-01-30 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2054:
-

 Summary: Compilation warnings
 Key: ARROW-2054
 URL: https://issues.apache.org/jira/browse/ARROW-2054
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


I suppose this may vary depending on the compiler, but I get the following 
warnings with gcc 4.9:
{code}
/home/antoine/arrow/cpp/src/plasma/fling.cc: In function ‘int send_fd(int, 
int)’:
/home/antoine/arrow/cpp/src/plasma/fling.cc:46:50: warning: dereferencing 
type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   *reinterpret_cast(CMSG_DATA(header)) = fd;
  ^
/home/antoine/arrow/cpp/src/arrow/python/io.cc: In member function ‘virtual 
arrow::Status arrow::py::PyReadableFile::Read(int64_t, 
std::shared_ptr*)’:
/home/antoine/arrow/cpp/src/arrow/python/io.cc:153:60: warning: ‘bytes_obj’ may 
be used uninitialized in this function [-Wmaybe-uninitialized]
   Py_DECREF(bytes_obj);
^
/home/antoine/arrow/cpp/src/arrow/python/io.cc: In member function ‘virtual 
arrow::Status arrow::py::PyReadableFile::Read(int64_t, int64_t*, void*)’:
/home/antoine/arrow/cpp/src/arrow/python/io.cc:141:60: warning: ‘bytes_obj’ may 
be used uninitialized in this function [-Wmaybe-uninitialized]
   Py_DECREF(bytes_obj);
^
/home/antoine/arrow/cpp/src/arrow/python/io.cc: In member function ‘virtual 
arrow::Status arrow::py::PyReadableFile::GetSize(int64_t*)’:
/home/antoine/arrow/cpp/src/arrow/python/io.cc:187:20: warning: ‘file_size’ may 
be used uninitialized in this function [-Wmaybe-uninitialized]
   *size = file_size;
^
/home/antoine/arrow/cpp/src/arrow/python/io.cc:46:65: warning: 
‘current_position’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  const_cast(argspec), args...);
 ^
/home/antoine/arrow/cpp/src/arrow/python/io.cc:175:11: note: ‘current_position’ 
was declared here
   int64_t current_position;
   ^
/home/antoine/arrow/cpp/src/arrow/ipc/json-internal.cc: In function 
‘arrow::Status arrow::ipc::internal::json::GetField(const Value&, const 
arrow::ipc::DictionaryMemo*, std::shared_ptr*)’:
/home/antoine/arrow/cpp/src/arrow/ipc/json-internal.cc:876:81: warning: 
‘dictionary_id’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 RETURN_NOT_OK(dictionary_memo->GetDictionary(dictionary_id, ));

 ^
/home/antoine/arrow/cpp/src/arrow/ipc/json-internal.cc: In function 
‘arrow::Status arrow::ipc::internal::json::ReadSchema(const Value&, 
arrow::MemoryPool*, std::shared_ptr*)’:
/home/antoine/arrow/cpp/src/arrow/ipc/json-internal.cc:1354:80: warning: 
‘dictionary_id’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 RETURN_NOT_OK(dictionary_memo->AddDictionary(dictionary_id, dictionary));

^
/home/antoine/arrow/cpp/src/arrow/ipc/json-internal.cc:1349:13: note: 
‘dictionary_id’ was declared here
 int64_t dictionary_id;
 ^
In file included from /home/antoine/arrow/cpp/src/arrow/api.h:25:0,
 from 
/home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:29:
/home/antoine/arrow/cpp/src/arrow/builder.h: In member function ‘arrow::Status 
arrow::py::TimestampConverter::AppendItem(const arrow::py::OwnedRef&)’:
/home/antoine/arrow/cpp/src/arrow/builder.h:284:5: warning: ‘t’ may be used 
uninitialized in this function [-Wmaybe-uninitialized]
 raw_data_[length_++] = val;
 ^
/home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:576:13: note: ‘t’ 
was declared here
 int64_t t;
 ^

{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2073) [Python] Create StructArray from sequence of tuples given a known data type

2018-02-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2073:
-

 Summary: [Python] Create StructArray from sequence of tuples given 
a known data type
 Key: ARROW-2073
 URL: https://issues.apache.org/jira/browse/ARROW-2073
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Following ARROW-1705, we should support calling {{pa.array}} with a sequence of 
tuples, presuming a struct type is passed for the {{type}} parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2074) [Python] Allow type inference for struct arrays

2018-02-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2074:
-

 Summary: [Python] Allow type inference for struct arrays
 Key: ARROW-2074
 URL: https://issues.apache.org/jira/browse/ARROW-2074
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Support inferring a struct type in a {{pa.array}} call, if a sequence of dicts 
(or dict of sequences?= is given. Of course, this could mean that the wrong 
field order may be inferred, though on Python 3.6+ dicts retain ordering until 
the first deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2067) "pip install" doesn't work from source tree

2018-01-31 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2067:
-

 Summary: "pip install" doesn't work from source tree
 Key: ARROW-2067
 URL: https://issues.apache.org/jira/browse/ARROW-2067
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


It seems that {{pip install .}} first copies the build dir into a temporary 
directory, and {{setuptools_scm}} then fails grabbing the git version from that 
location.

AFAIR {{versioneer}} doesn't have that issue.

{code:bash}
$ pip install .
Processing /home/antoine/arrow/python
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
  File "", line 1, in 
  File "/tmp/pip-v_mucrpj-build/setup.py", line 456, in 
    url="https://arrow.apache.org/;
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/site-packages/setuptools/__init__.py",
 line 129, in setup
    return distutils.core.setup(**attrs)
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/core.py", line 
108, in setup
    _setup_distribution = dist = klass(attrs)
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/site-packages/setuptools/dist.py",
 line 333, in __init__
    _Distribution.__init__(self, attrs)
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/dist.py", line 
281, in __init__
    self.finalize_options()
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/site-packages/setuptools/dist.py",
 line 476, in finalize_options
    ep.load()(self, ep.name, value)
  File 
"/tmp/pip-v_mucrpj-build/.eggs/setuptools_scm-1.15.7-py3.6.egg/setuptools_scm/integration.py",
 line 22, in version_keyword
    dist.metadata.version = get_version(**value)
  File 
"/tmp/pip-v_mucrpj-build/.eggs/setuptools_scm-1.15.7-py3.6.egg/setuptools_scm/__init__.py",
 line 119, in get_version
    parsed_version = _do_parse(root, parse)
  File 
"/tmp/pip-v_mucrpj-build/.eggs/setuptools_scm-1.15.7-py3.6.egg/setuptools_scm/__init__.py",
 line 97, in _do_parse
    "use git+https://github.com/user/proj.git#egg=proj; % root)
    LookupError: setuptools-scm was unable to detect version for 
'/tmp/pip-v_mucrpj-build'.
    
    Make sure you're either building from a fully intact git repository or PyPI 
tarballs. Most other sources (such as GitHub's tarballs, a git checkout without 
the .git folder) don't contain the necessary metadata and will not work.
    
    For example, if you're using pip, instead of 
https://github.com/user/proj/archive/master.zip use 
git+https://github.com/user/proj.git#egg=proj
    
    
Command "python setup.py egg_info" failed with error code 1 in 
/tmp/pip-v_mucrpj-build/
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2072) [Python] decimal128.byte_width crashes

2018-01-31 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2072:
-

 Summary: [Python] decimal128.byte_width crashes
 Key: ARROW-2072
 URL: https://issues.apache.org/jira/browse/ARROW-2072
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


{code:bash}
$ python -c "import pyarrow as pa; ty = pa.decimal128(20, 7); 
print(ty.byte_width)"
Erreur de segmentation (core dumped)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2052) Unify OwnedRef and ScopedRef

2018-01-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2052:
-

 Summary: Unify OwnedRef and ScopedRef
 Key: ARROW-2052
 URL: https://issues.apache.org/jira/browse/ARROW-2052
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


Currently {{OwnedRef}} and {{ScopedRef}} have similar semantics with small 
differences. Furtheremore, the naming distinction isn't obvious.

I propose to unify them as a single {{OwnedRef}} class with the following 
characteristics:
- doesn't take the GIL automatically
- has a {{release()}} method that decrefs the pointer (and sets the internal 
copy to NULL) before returning it
- has a {{detach()}} method that returns the pointer (and sets the internal 
copy to NULL) without decrefing it

For the rare situations where an {{OwnedRef}} may be destroyed with the GIL 
released, a {{OwnedRefNoGIL}} derived class would also be proposed (the naming 
scheme follows Cython here).

Opinions / comments?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2092) [Python] Enhance benchmark suite

2018-02-05 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2092:
-

 Summary: [Python] Enhance benchmark suite
 Key: ARROW-2092
 URL: https://issues.apache.org/jira/browse/ARROW-2092
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


We need to test more operations in the ASV-based benchmarks suite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2111) [C++] Linting could be faster

2018-02-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2111:
-

 Summary: [C++] Linting could be faster
 Key: ARROW-2111
 URL: https://issues.apache.org/jira/browse/ARROW-2111
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


Currently {{make lint}} style-checks C++ files sequentially (by calling 
{{cpplint}}). We could instead style-check those files in parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2134) [CI] Make Travis commit inspection more robust

2018-02-12 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2134:
-

 Summary: [CI] Make Travis commit inspection more robust
 Key: ARROW-2134
 URL: https://issues.apache.org/jira/browse/ARROW-2134
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


See [https://github.com/apache/arrow/pull/1586#issuecomment-364857558]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2138) [C++] Have FatalLog abort instead of exiting

2018-02-12 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2138:
-

 Summary: [C++] Have FatalLog abort instead of exiting
 Key: ARROW-2138
 URL: https://issues.apache.org/jira/browse/ARROW-2138
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


Not sure this is desirable, since {{util/logging.h}} was taken from glog, but 
the various debug checks current {{std::exit(1)}} on failure. This is a clean 
exit (though with an error code) and therefore doesn't trigger the usual 
debugging tools such as gdb or Python's faulthandler. By replacing it with 
something like {{std::abort()}} the exit would be recognized as a process crash.

 

Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-02-12 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2142:
-

 Summary: [Python] Conversion from Numpy struct array unimplemented
 Key: ARROW-2142
 URL: https://issues.apache.org/jira/browse/ARROW-2142
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
>>> arr
array([(1.5,)], dtype=[('x', '>> arr[0]
(1.5,)
>>> arr['x']
array([1.5], dtype=float32)
>>> arr['x'][0]
1.5
>>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
Traceback (most recent call last):
  File "", line 1, in 
    pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
  File "array.pxi", line 177, in pyarrow.lib.array
  File "error.pxi", line 77, in pyarrow.lib.check_status
  File "error.pxi", line 85, in pyarrow.lib.check_status
ArrowNotImplementedError: 
/home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: 
converter.Convert()
NumPyConverter doesn't implement  conversion.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2140) [Python] Conversion from Numpy float16 array unimplemented

2018-02-12 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2140:
-

 Summary: [Python] Conversion from Numpy float16 array unimplemented
 Key: ARROW-2140
 URL: https://issues.apache.org/jira/browse/ARROW-2140
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> arr = np.array([1.5], dtype=np.float16)
>>> pa.array(arr)
Traceback (most recent call last):
  File "", line 1, in 
pa.array(arr)
  File "array.pxi", line 177, in pyarrow.lib.array
  File "array.pxi", line 84, in pyarrow.lib._ndarray_to_array
  File "public-api.pxi", line 158, in pyarrow.lib.pyarrow_wrap_array
KeyError: 10
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2141) [Python] Conversion from Numpy object array to varsize binary unimplemented

2018-02-12 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2141:
-

 Summary: [Python] Conversion from Numpy object array to varsize 
binary unimplemented
 Key: ARROW-2141
 URL: https://issues.apache.org/jira/browse/ARROW-2141
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> arr = np.array([b'xx'], dtype=np.object)
>>> pa.array(arr, type=pa.binary(2))

[
  b'xx'
]
>>> pa.array(arr, type=pa.binary())
Traceback (most recent call last):
  File "", line 1, in 
    pa.array(arr, type=pa.binary())
  File "array.pxi", line 177, in pyarrow.lib.array
  File "error.pxi", line 77, in pyarrow.lib.check_status
  File "error.pxi", line 85, in pyarrow.lib.check_status
ArrowNotImplementedError: 
/home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: 
converter.Convert()
/home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1098 code: 
compute::Cast(, *arr, type_, options, )
/home/antoine/arrow/cpp/src/arrow/compute/kernels/cast.cc:1022 code: Cast(ctx, 
Datum(array.data()), out_type, options, _out)
/home/antoine/arrow/cpp/src/arrow/compute/kernels/cast.cc:1009 code: 
GetCastFunction(*value.type(), out_type, options, )
No cast implemented from binary to binary
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2147) [Python] Type inference doesn't work on lists of Numpy arrays

2018-02-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2147:
-

 Summary: [Python] Type inference doesn't work on lists of Numpy 
arrays
 Key: ARROW-2147
 URL: https://issues.apache.org/jira/browse/ARROW-2147
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> arr = np.int16([2, 3, 4])
>>> pa.array(arr)

[
  2,
  3,
  4
]
>>> pa.array([arr])
Traceback (most recent call last):
  File "", line 1, in 
    pa.array([arr])
  File "array.pxi", line 181, in pyarrow.lib.array
  File "array.pxi", line 26, in pyarrow.lib._sequence_to_array
  File "error.pxi", line 77, in pyarrow.lib.check_status
ArrowInvalid: /home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:964 
code: InferArrowType(seq, _type)
/home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:321 code: 
seq_visitor.Visit(obj)
/home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:195 code: 
VisitElem(ref, level)
Error inferring Arrow data type for collection of Python objects. Got Python 
object of type ndarray but can only handle these types: bool, float, integer, 
date, datetime, bytes, unicode
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2150) [Python] array equality defaults to identity

2018-02-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2150:
-

 Summary: [Python] array equality defaults to identity
 Key: ARROW-2150
 URL: https://issues.apache.org/jira/browse/ARROW-2150
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


I'm not sure this is deliberate, but it doesn't look very desirable to me:
{code}
>>> pa.array([1,2,3], type=pa.int32()) == pa.array([1,2,3], type=pa.int32())
False
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2148) [Python] to_pandas() on struct array returns object array

2018-02-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2148:
-

 Summary: [Python] to_pandas() on struct array returns object array
 Key: ARROW-2148
 URL: https://issues.apache.org/jira/browse/ARROW-2148
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Antoine Pitrou


This should probably return a Numpy struct array instead:

{code:python}
>>> arr = pa.array([{'a': 1, 'b': 2.5}, {'a': 2, 'b': 3.5}], 
>>> type=pa.struct([pa.field('a', pa.int32()), pa.field('b', pa.float64())]))
>>> arr.type
StructType(struct)
>>> arr.to_pandas()
array([{'a': 1, 'b': 2.5}, {'a': 2, 'b': 3.5}], dtype=object)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2149) [Python] reorganize test_convert_pandas.py

2018-02-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2149:
-

 Summary: [Python] reorganize test_convert_pandas.py
 Key: ARROW-2149
 URL: https://issues.apache.org/jira/browse/ARROW-2149
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


{{test_convert_pandas.py}} is getting painful to navigate through. We should 
reorganize the tests in various classes / categories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2151) [Python] Error when converting from list of uint64 arrays

2018-02-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2151:
-

 Summary: [Python] Error when converting from list of uint64 arrays
 Key: ARROW-2151
 URL: https://issues.apache.org/jira/browse/ARROW-2151
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> pa.array(np.uint64([0,1,2]), type=pa.uint64())

[
  0,
  1,
  2
]
>>> pa.array([np.uint64([0,1,2])], type=pa.list_(pa.uint64()))
Traceback (most recent call last):
  File "", line 1, in 
pa.array([np.uint64([0,1,2])], type=pa.list_(pa.uint64()))
  File "array.pxi", line 181, in pyarrow.lib.array
  File "array.pxi", line 36, in pyarrow.lib._sequence_to_array
  File "error.pxi", line 98, in pyarrow.lib.check_status
ArrowException: Unknown error: 
/home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:979 code: 
AppendPySequence(seq, size, real_type, builder.get())
/home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:402 code: 
static_cast(this)->AppendSingle(ref.obj())
/home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:402 code: 
static_cast(this)->AppendSingle(ref.obj())
/home/antoine/arrow/cpp/src/arrow/python/builtin_convert.cc:542 code: 
CheckPyError()
an integer is required
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2155) [Python] pa.frombuffer(bytearray) returns immutable Buffer

2018-02-14 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2155:
-

 Summary: [Python] pa.frombuffer(bytearray) returns immutable Buffer
 Key: ARROW-2155
 URL: https://issues.apache.org/jira/browse/ARROW-2155
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


I'd expect it to return a mutable buffer:
{code:python}
>>> pa.frombuffer(bytearray(10)).is_mutable
False
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2154) [Python] __eq__ unimplemented on Buffer

2018-02-14 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2154:
-

 Summary: [Python] __eq__ unimplemented on Buffer
 Key: ARROW-2154
 URL: https://issues.apache.org/jira/browse/ARROW-2154
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


Having to call {{equals()}} is un-Pythonic:
{code:python}
>>> pa.frombuffer(b'foo') == pa.frombuffer(b'foo')
False
>>> pa.frombuffer(b'foo').equals(pa.frombuffer(b'foo'))
True
{code}

Same for many other pyarrow types, incidently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2156) [CI] Isolate Sphinx dependencies

2018-02-14 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2156:
-

 Summary: [CI] Isolate Sphinx dependencies
 Key: ARROW-2156
 URL: https://issues.apache.org/jira/browse/ARROW-2156
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


In the Travis Python test script, we always install the documentation 
dependencies. We should only install them when building the docs, since they 
are not trivial and may take time fetching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2108) [Python] Update instructions for ASV

2018-02-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2108:
-

 Summary: [Python] Update instructions for ASV
 Key: ARROW-2108
 URL: https://issues.apache.org/jira/browse/ARROW-2108
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Now that PR [https://github.com/airspeed-velocity/asv/pull/611] has been 
merged, we don't need to advertise our fork anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2193) [Plasma] plasma_store forks endlessly

2018-02-21 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2193:
-

 Summary: [Plasma] plasma_store forks endlessly
 Key: ARROW-2193
 URL: https://issues.apache.org/jira/browse/ARROW-2193
 Project: Apache Arrow
  Issue Type: Bug
  Components: Plasma (C++)
Reporter: Antoine Pitrou


I'm not sure why, but when I run the pyarrow test suite (for example {{py.test 
pyarrow/tests/test_plasma.py}}), plasma_store forks endlessly:

{code:bash}
 $ ps fuwww
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
[...]
antoine  27869 12.0  0.4 863208 68976 pts/7S13:41   0:01 
/home/antoine/miniconda3/envs/pyarrow/bin/python 
/home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 -m 
1
antoine  27885 13.0  0.4 863076 68560 pts/7S13:41   0:01  \_ 
/home/antoine/miniconda3/envs/pyarrow/bin/python 
/home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 -m 
1
antoine  27901 12.1  0.4 863076 68320 pts/7S13:41   0:01  \_ 
/home/antoine/miniconda3/envs/pyarrow/bin/python 
/home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 -m 
1
antoine  27920 13.6  0.4 863208 68868 pts/7S13:41   0:01  \_ 
/home/antoine/miniconda3/envs/pyarrow/bin/python 
/home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 -m 
1
[etc.]
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2172) [Python] Incorrect conversion from Numpy array when stride % itemsize != 0

2018-02-19 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2172:
-

 Summary: [Python] Incorrect conversion from Numpy array when 
stride % itemsize != 0
 Key: ARROW-2172
 URL: https://issues.apache.org/jira/browse/ARROW-2172
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


In the example below, the input array has a stride that's not a multiple of the 
itemsize:

{code:python}
>>> data = np.array([(42, True), (43, False)],
...:dtype=[('x', np.int32), ('y', np.bool_)])
...:
...:
>>> data['x']
array([42, 43], dtype=int32)
>>> pa.array(data['x'], type=pa.int32())

[
  42,
  11009
]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2173) [Python] NumPyBuffer destructor should hold the GIL

2018-02-19 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2173:
-

 Summary: [Python] NumPyBuffer destructor should hold the GIL
 Key: ARROW-2173
 URL: https://issues.apache.org/jira/browse/ARROW-2173
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Failure to hold the GIL can lead to crashes, depending on presence of several 
threads or whatever the object allocator needs to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2171) [Python] OwnedRef is fragile

2018-02-19 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2171:
-

 Summary: [Python] OwnedRef is fragile
 Key: ARROW-2171
 URL: https://issues.apache.org/jira/browse/ARROW-2171
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Some uses of OwnedRef can implicitly invoke its (default) copy constructor, 
which will lead to extraneous decrefs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2197) Document "undefined symbol" issue and workaround

2018-02-22 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2197:
-

 Summary: Document "undefined symbol" issue and workaround
 Key: ARROW-2197
 URL: https://issues.apache.org/jira/browse/ARROW-2197
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


See [https://github.com/apache/arrow/issues/1612]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2218) [Python] PythonFile should infer mode when not given

2018-02-26 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2218:
-

 Summary: [Python] PythonFile should infer mode when not given
 Key: ARROW-2218
 URL: https://issues.apache.org/jira/browse/ARROW-2218
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


The following is clearly not optimal:

{code:python}
>>> f = open('README.md', 'r')
>>> pa.PythonFile(f).mode
'wb'
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2950) [C++] Clean up util/bit-util.h

2018-07-31 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2950:
-

 Summary: [C++] Clean up util/bit-util.h
 Key: ARROW-2950
 URL: https://issues.apache.org/jira/browse/ARROW-2950
 Project: Apache Arrow
  Issue Type: Task
Reporter: Antoine Pitrou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3029) [Python] pkg_resources is slow

2018-08-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3029:
-

 Summary: [Python] pkg_resources is slow
 Key: ARROW-3029
 URL: https://issues.apache.org/jira/browse/ARROW-3029
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.10.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Importing and calling {{pkg_resources}} at pyarrow import time to get the 
version number is slow (around 200 ms. here, out of 640 ms. total for importing 
pyarrow).

Instead we could generate a version file, which seems possible using 
{{setuptools_scm}}'s {{write_to}} parameter: 
https://github.com/pypa/setuptools_scm/#configuration-parameters




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3059) [C++] Streamline namespace array::test

2018-08-15 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3059:
-

 Summary: [C++] Streamline namespace array::test
 Key: ARROW-3059
 URL: https://issues.apache.org/jira/browse/ARROW-3059
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


Currently we have some test helpers that live in the {{arrow::test}} namespace, 
some in {{arrow}} (or topic subnamespaces such as {{arrow::io}}). I see no 
reason for the discrepancy.

I propose the simple solution of removing the {{arrow::test}} namespace 
altogether. If not desirable, then we should make sure we put all helpers in 
that namespace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3060) [C++] Factor our parsing routines

2018-08-15 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3060:
-

 Summary: [C++] Factor our parsing routines
 Key: ARROW-3060
 URL: https://issues.apache.org/jira/browse/ARROW-3060
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


We have implementations of casting strings to numbers in the {{compute}} 
directory. Those can be more broadly useful (for example when parsing CSV 
files). We should therefore centralize them in their own C++ module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2992) [Python] Parquet benchmark failure

2018-08-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2992:
-

 Summary: [Python] Parquet benchmark failure
 Key: ARROW-2992
 URL: https://issues.apache.org/jira/browse/ARROW-2992
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This is a regression on git master:
{code:python}
Traceback (most recent call last):
  File "/home/antoine/asv/asv/benchmark.py", line 867, in 
commands[mode](args)
  File "/home/antoine/asv/asv/benchmark.py", line 844, in main_run
result = benchmark.do_run()
  File "/home/antoine/asv/asv/benchmark.py", line 398, in do_run
return self.run(*self._current_params)
  File "/home/antoine/asv/asv/benchmark.py", line 473, in run
samples, number = self.benchmark_timing(timer, repeat, warmup_time, 
number=number)
  File "/home/antoine/asv/asv/benchmark.py", line 520, in benchmark_timing
timing = timer.timeit(number)
  File "/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/timeit.py", line 
178, in timeit
timing = self.inner(it, self.timer)
  File "", line 6, in inner
  File "/home/antoine/asv/asv/benchmark.py", line 464, in 
func = lambda: self.func(*param)
  File "/home/antoine/arrow/python/benchmarks/parquet.py", line 54, in 
time_manifest_creation
pq.ParquetManifest(self.tmpdir, thread_pool=thread_pool)
TypeError: __init__() got an unexpected keyword argument 'thread_pool'
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2991) [CI] Cut down number of AppVeyor jobs

2018-08-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2991:
-

 Summary: [CI] Cut down number of AppVeyor jobs
 Key: ARROW-2991
 URL: https://issues.apache.org/jira/browse/ARROW-2991
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


AppVeyor builds all jobs serially so it's important not to have too many of 
them to avoid builds taking too much time and queuing up.

I suggest to remove the following jobs:
- the Release build with Ninja and VS2015; we already have both a Release build 
with Ninja and VS2017, and a Debug build with Ninja and VS2015
- the two NMake builds: we already exercise the Ninja (cross-platform, fastest) 
and Visual Studio (standard under Windows) build chains

[~Max Risuhin] you added some of those jobs, do you have any concerns?




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3049) [C++/Python] ORC reader fails on empty file

2018-08-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3049:
-

 Summary: [C++/Python] ORC reader fails on empty file
 Key: ARROW-3049
 URL: https://issues.apache.org/jira/browse/ARROW-3049
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


{code}
Traceback (most recent call last):
  File "/home/antoine/arrow/python/pyarrow/tests/test_orc.py", line 83, in 
test_orcfile_empty
check_example('TestOrcFile.emptyFile')
  File "/home/antoine/arrow/python/pyarrow/tests/test_orc.py", line 79, in 
check_example
os.path.join(orc_data_dir, '%s.jsn.gz' % name))
  File "/home/antoine/arrow/python/pyarrow/tests/test_orc.py", line 62, in 
check_example_files
table = orc_file.read()
  File "/home/antoine/arrow/python/pyarrow/orc.py", line 149, in read
return self.reader.read(include_indices=include_indices)
  File "pyarrow/_orc.pyx", line 106, in pyarrow._orc.ORCReader.read
check_status(deref(self.reader).Read(_table))
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
raise ArrowInvalid(message)
pyarrow.lib.ArrowInvalid: Must pass at least one record batch
{code}

[~jim.crist]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3047) [C++] cmake downloads and builds ORC even though it's installed

2018-08-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3047:
-

 Summary: [C++] cmake downloads and builds ORC even though it's 
installed
 Key: ARROW-3047
 URL: https://issues.apache.org/jira/browse/ARROW-3047
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


I have installed orc 1.5.1 from conda-forge, but our cmake build chain still 
tries to build protobuf and ORC from source (and fails).

{code:bash}
$ ls $CONDA_PREFIX/include/orc/
ColumnPrinter.hh  Common.hh  Exceptions.hh  Int128.hh  MemoryPool.hh  
orc-config.hh  OrcFile.hh  Reader.hh  Statistics.hh  Type.hh  Vector.hh  
Writer.hh
$ ls -l $CONDA_PREFIX/lib/liborc*
-rw-rw-r-- 2 antoine antoine 1952298 juin  20 17:32 
/home/antoine/miniconda3/envs/pyarrow/lib/liborc.a
{code}

[~jim.crist]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3095) [Python] test_plasma.py fails

2018-08-20 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3095:
-

 Summary: [Python] test_plasma.py fails
 Key: ARROW-3095
 URL: https://issues.apache.org/jira/browse/ARROW-3095
 Project: Apache Arrow
  Issue Type: Bug
  Components: Plasma (C++), Python
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


All tests in {{test_plasma.py}} fail here. It seems that plasma_store fails 
launching or something:
{code}
$ python -m pytest -x -r s --tb=native pyarrow/tests/test_plasma.py 
 test 
session starts 

platform linux -- Python 3.7.0, pytest-3.7.2, py-1.5.4, pluggy-0.7.1
rootdir: /home/antoine/arrow/python, inifile: setup.cfg
plugins: timeout-1.3.1, faulthandler-1.5.0
collected 24 items  


pyarrow/tests/test_plasma.py E

=== 
ERRORS 
===
 ERROR at setup of 
TestPlasmaClient.test_connection_failure_raises_exception 
_
Traceback (most recent call last):
  File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 119, in 
setup_method
self.plasma_client = plasma.connect(plasma_store_name, "", 64)
  File "pyarrow/_plasma.pyx", line 691, in pyarrow._plasma.connect
check_status(result.client.get()
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
raise ArrowIOError(message)
pyarrow.lib.ArrowIOError: ../src/plasma/client.cc:921 code: 
ConnectIpcSocketRetry(store_socket_name, num_retries, -1, _conn_)
Could not connect to socket /tmp/test_plasma-ikgi25pf/plasma.sock
--- Captured 
stderr setup 

Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 50 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 49 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 48 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 47 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 46 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 45 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 44 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 43 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 42 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 41 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 40 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 39 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 38 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 37 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 36 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 35 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 34 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 33 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 32 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 31 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 30 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 29 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 28 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 27 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 26 more times
Connection to IPC socket failed for pathname 
/tmp/test_plasma-ikgi25pf/plasma.sock, retrying 25 more times
Connection to IPC 

[jira] [Created] (ARROW-3093) [C++] Linking errors with ORC enabled

2018-08-20 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3093:
-

 Summary: [C++] Linking errors with ORC enabled
 Key: ARROW-3093
 URL: https://issues.apache.org/jira/browse/ARROW-3093
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


In an attempt to work around ARROW-3091 and ARROW-3092, I've recreated my conda 
environment, and now I get linking errors if ORC support is enabled:

{code}
debug/libarrow.so.11.0.0: error: undefined reference to 
'google::protobuf::MessageLite::ParseFromString(std::string const&)'
debug/libarrow.so.11.0.0: error: undefined reference to 
'google::protobuf::MessageLite::SerializeToString(std::string*) const'
debug/libarrow.so.11.0.0: error: undefined reference to 
'google::protobuf::internal::fixed_address_empty_string'
[etc.]
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3092) [C++] Segfault in json-integration-test

2018-08-20 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3092:
-

 Summary: [C++] Segfault in json-integration-test
 Key: ARROW-3092
 URL: https://issues.apache.org/jira/browse/ARROW-3092
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


I've upgraded to Ubuntu 18.04.1 and now I get segfaults in 
json-integration-test:

{code}
(gdb) run
Starting program: 
/home/antoine/arrow/cpp/build-test/debug/json-integration-test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[==] Running 2 tests from 1 test case.
[--] Global test environment set-up.
[--] 2 tests from TestJSONIntegration
[ RUN  ] TestJSONIntegration.ConvertAndValidate

Program received signal SIGSEGV, Segmentation fault.
std::string::_Rep::_M_is_leaked (this=this@entry=0xffe8)
at 
/home/msarahan/miniconda2/conda-bld/compilers_linux-64_1507259624353/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3075
3075
/home/msarahan/miniconda2/conda-bld/compilers_linux-64_1507259624353/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.h:
 Aucun fichier ou dossier de ce type.
(gdb) bt
#0  std::string::_Rep::_M_is_leaked (this=this@entry=0xffe8)
at 
/home/msarahan/miniconda2/conda-bld/compilers_linux-64_1507259624353/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3075
#1  0x77311856 in std::string::_Rep::_M_grab (this=0xffe8, 
__alloc1=..., __alloc2=...)
at 
/home/msarahan/miniconda2/conda-bld/compilers_linux-64_1507259624353/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3126
#2  0x7731189d in std::basic_string, 
std::allocator >::basic_string (this=0x7fffcf68, __str=...)
at 
/home/msarahan/miniconda2/conda-bld/compilers_linux-64_1507259624353/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:613
#3  0x005f63fd in boost::filesystem::path::path (this=0x7fffcf68, 
p=...)
at 
/home/antoine/miniconda3/envs/pyarrow/include/boost/filesystem/path.hpp:137
#4  0x005f628a in boost::filesystem::operator/ (lhs=..., rhs=...)
at 
/home/antoine/miniconda3/envs/pyarrow/include/boost/filesystem/path.hpp:792
#5  0x005f1d37 in arrow::ipc::temp_path () at 
../src/arrow/ipc/json-integration-test.cc:233
#6  0x005f3038 in arrow::ipc::TestJSONIntegration::mkstemp (this=)
at ../src/arrow/ipc/json-integration-test.cc:241
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3091) [C++] Segfault in io-hdfs-test

2018-08-20 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3091:
-

 Summary: [C++] Segfault in io-hdfs-test
 Key: ARROW-3091
 URL: https://issues.apache.org/jira/browse/ARROW-3091
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


I've upgraded to Ubuntu 18.04.1 and now I get segfaults in io-hdfs-test:

{code}
(gdb) run
Starting program: /home/antoine/arrow/cpp/build-test/debug/io-hdfs-test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Running main() from gtest_main.cc
[==] Running 24 tests from 2 test cases.
[--] Global test environment set-up.
[--] 12 tests from TestHadoopFileSystem/0, where TypeParam = 
arrow::io::JNIDriver
[ RUN  ] TestHadoopFileSystem/0.ConnectsAgain

Program received signal SIGSEGV, Segmentation fault.
0x775a15ae in boost::filesystem::path::m_append_separator_if_needed() ()
   from /home/antoine/miniconda3/envs/pyarrow/lib/libboost_filesystem.so.1.67.0
(gdb) bt
#0  0x775a15ae in 
boost::filesystem::path::m_append_separator_if_needed() ()
   from /home/antoine/miniconda3/envs/pyarrow/lib/libboost_filesystem.so.1.67.0
#1  0x775a2917 in 
boost::filesystem::path::operator/=(boost::filesystem::path const&) ()
   from /home/antoine/miniconda3/envs/pyarrow/lib/libboost_filesystem.so.1.67.0
#2  0x00549647 in boost::filesystem::operator/ (lhs=..., rhs=...)
at 
/home/antoine/miniconda3/envs/pyarrow/include/boost/filesystem/path.hpp:792
#3  0x00547b2d in 
arrow::io::TestHadoopFileSystem::SetUp (this=0x7143c0) at 
../src/arrow/io/io-hdfs-test.cc:98
#4  0x0065e98e in 
testing::internal::HandleSehExceptionsInMethodIfSupported 
(object=0x7143c0, 
method= testing::Test::SetUp(), location=0x66a48a "SetUp()")
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2402
#5  0x0064a7e5 in 
testing::internal::HandleExceptionsInMethodIfSupported 
(object=0x7143c0, 
method= testing::Test::SetUp(), location=0x66a48a "SetUp()")
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2438
#6  0x00632a14 in testing::Test::Run (this=0x7143c0)
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2470
#7  0x006336fd in testing::TestInfo::Run (this=0x710420)
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2656
#8  0x00633dbc in testing::TestCase::Run (this=0x7108f0)
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2774
#9  0x0063b331 in testing::internal::UnitTestImpl::RunAllTests 
(this=0x710590)
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:4649
#10 0x0066208e in 
testing::internal::HandleSehExceptionsInMethodIfSupported (object=0x710590, 
method=(bool 
(testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 
0x63b050 , location=0x66ac25 
"auxiliary test code (environments or event listeners)")
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2402
#11 0x0064c945 in 
testing::internal::HandleExceptionsInMethodIfSupported (object=0x710590, 
method=(bool 
(testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 
0x63b050 , location=0x66ac25 
"auxiliary test code (environments or event listeners)")
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2438
#12 0x0063b003 in testing::UnitTest::Run (this=0x6fcd48 
)
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:4257
#13 0x00666481 in RUN_ALL_TESTS ()
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/include/gtest/gtest.h:2233
#14 0x0066644c in main (argc=1, argv=0x7fffd878)
at 
/home/antoine/arrow/cpp/build-test/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc:37
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3110) [C++] Compilation warnings with gcc 7.3.0

2018-08-22 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3110:
-

 Summary: [C++] Compilation warnings with gcc 7.3.0
 Key: ARROW-3110
 URL: https://issues.apache.org/jira/browse/ARROW-3110
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This is happening when building in release mode:
{code}
../src/arrow/python/python_to_arrow.cc: In function 'arrow::Status 
arrow::py::detail::BuilderAppend(arrow::BinaryBuilder*, PyObject*, bool*)':
../src/arrow/python/python_to_arrow.cc:388:56: warning: 'length' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
   if (ARROW_PREDICT_FALSE(builder->value_data_length() + length > 
kBinaryMemoryLimit)) {
^
../src/arrow/python/python_to_arrow.cc:385:11: note: 'length' was declared here
   int32_t length;
   ^~
In file included from ../src/arrow/python/serialize.cc:32:0:
../src/arrow/builder.h: In member function 'arrow::Status 
arrow::py::SequenceBuilder::Update(int64_t, int8_t*)':
../src/arrow/builder.h:413:5: warning: 'offset32' may be used uninitialized in 
this function [-Wmaybe-uninitialized]
 raw_data_[length_++] = val;
 ^
../src/arrow/python/serialize.cc:90:13: note: 'offset32' was declared here
 int32_t offset32;
 ^~~~
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3099) [C++] Add benchmark for number parsing

2018-08-21 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3099:
-

 Summary: [C++] Add benchmark for number parsing
 Key: ARROW-3099
 URL: https://issues.apache.org/jira/browse/ARROW-3099
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Number parsing will become important once we have a CSV reader (or possibly 
other text-based formats). We should add benchmarks for the internal conversion 
routines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3100) [CI] C/glib build broken on OS X

2018-08-21 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3100:
-

 Summary: [CI] C/glib build broken on OS X
 Key: ARROW-3100
 URL: https://issues.apache.org/jira/browse/ARROW-3100
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, GLib
Reporter: Antoine Pitrou


The Travis-CI build fails to find luarocks:
https://travis-ci.org/apache/arrow/jobs/418753219#L2657

{code}
+sudo env PKG_CONFIG_PATH=:/usr/local/opt/libffi/lib/pkgconfig luarocks install 
lgi
env: luarocks: No such file or directory

The command "$TRAVIS_BUILD_DIR/ci/travis_before_script_c_glib.sh" failed and 
exited with 127 during .
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3125) [Python] Update ASV instructions

2018-08-27 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3125:
-

 Summary: [Python] Update ASV instructions
 Key: ARROW-3125
 URL: https://issues.apache.org/jira/browse/ARROW-3125
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.10.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


The ability to define custom install / build / uninstall commands was added in 
mainline ASV in https://github.com/airspeed-velocity/asv/pull/699
We don't need to use our own fork / PR anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3140) [Plasma] Plasma fails building with GPU enabled

2018-08-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3140:
-

 Summary: [Plasma] Plasma fails building with GPU enabled
 Key: ARROW-3140
 URL: https://issues.apache.org/jira/browse/ARROW-3140
 Project: Apache Arrow
  Issue Type: Bug
  Components: GPU, Plasma (C++)
Reporter: Antoine Pitrou


{code}
In file included from ../src/plasma/client.h:30:0,
 from ../src/plasma/client.cc:20:
../src/plasma/common.h:120:19: error: ‘CudaIpcMemHandle’ was not declared in 
this scope
   std::shared_ptr ipc_handle;
   ^~~~
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2913) [Python] Exported buffers don't expose type information

2018-07-25 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2913:
-

 Summary: [Python] Exported buffers don't expose type information
 Key: ARROW-2913
 URL: https://issues.apache.org/jira/browse/ARROW-2913
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


Using the {{buffers()}} method on array gives you a list of buffers backing the 
array, but those buffers lose typing information:
{code:python}
>>> a = pa.array(range(10))
>>> a.type
DataType(int64)
>>> buffers = a.buffers()
>>> [(memoryview(buf).format, memoryview(buf).shape) for buf in buffers]
[('b', (2,)), ('b', (80,))]
{code}

Conversely, Numpy exposes type information in the Python buffer protocol:
{code:python}
>>> a = pa.array(range(10))
>>> memoryview(a.to_numpy()).format
'l'
>>> memoryview(a.to_numpy()).shape
(10,)
{code}

Exposing type information on buffers could be important for third-party 
systems, such as Dask/distributed, for type-based data compression when 
serializing.

Since our C++ buffers are not typed, it's not obvious how to solve this. Should 
we return tensors instead?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2867) [Python] Incorrect example for Cython usage

2018-07-17 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2867:
-

 Summary: [Python] Incorrect example for Cython usage
 Key: ARROW-2867
 URL: https://issues.apache.org/jira/browse/ARROW-2867
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation, Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


When blindly pasting the Cython distutils example, one might get the following 
error:
{code}
Traceback (most recent call last):
  File "setup.py", line 20, in 
ext_modules=ext_modules,
  File "/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/core.py", 
line 148, in setup
dist.run_commands()
  File "/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/dist.py", 
line 955, in run_commands
self.run_command(cmd)
  File "/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/dist.py", 
line 974, in run_command
cmd_obj.run()
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/command/build_ext.py",
 line 339, in run
self.build_extensions()
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/command/build_ext.py",
 line 448, in build_extensions
self._build_extensions_serial()
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/command/build_ext.py",
 line 473, in _build_extensions_serial
self.build_extension(ext)
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/command/build_ext.py",
 line 558, in build_extension
target_lang=language)
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/ccompiler.py", 
line 717, in link_shared_object
extra_preargs, extra_postargs, build_temp, target_lang)
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/unixccompiler.py",
 line 159, in link
libraries)
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/ccompiler.py", 
line 1089, in gen_lib_options
lib_opts.append(compiler.library_dir_option(dir))
  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/distutils/unixccompiler.py",
 line 207, in library_dir_option
return "-L" + dir
TypeError: must be str, not list
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3142) [C++] Fetch all libs from toolchain environment

2018-08-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3142:
-

 Summary: [C++] Fetch all libs from toolchain environment
 Key: ARROW-3142
 URL: https://issues.apache.org/jira/browse/ARROW-3142
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou


When setting ARROW_BUILD_TOOLCHAIN, gtest and orc are currently not taken from 
the toolchain environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3167) [CI] Limit clcache cache size

2018-09-04 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3167:
-

 Summary: [CI] Limit clcache cache size
 Key: ARROW-3167
 URL: https://issues.apache.org/jira/browse/ARROW-3167
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


The clcache cache on AppVeyor has a default max size of 1 GB and can reach 
close to this size (see e.g. 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/build/1.0.7722/job/5gp85w0m5xei0nme#L251).
 We should limit its size to something more reasonable to lower cache transfer 
/ compression times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2825) [C++] Need AllocateBuffer / AllocateResizableBuffer variant with default memory pool

2018-07-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2825:
-

 Summary: [C++] Need AllocateBuffer / AllocateResizableBuffer 
variant with default memory pool
 Key: ARROW-2825
 URL: https://issues.apache.org/jira/browse/ARROW-2825
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


It's not very practical that you have to pass the default memory pool 
explicitly to {{AllocateBuffer}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2826) [C++] Clarification needed between ArrayBuilder::Init(), Resize() and Reserve()

2018-07-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2826:
-

 Summary: [C++] Clarification needed between ArrayBuilder::Init(), 
Resize() and Reserve()
 Key: ARROW-2826
 URL: https://issues.apache.org/jira/browse/ARROW-2826
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


It's still not clear to me why we have three builder methods that seem to do 
essentially the same thing. This should be clarified somewhere in the 
docstrings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2838) [Python] Speed up null testing with Pandas semantics

2018-07-12 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2838:
-

 Summary: [Python] Speed up null testing with Pandas semantics
 Key: ARROW-2838
 URL: https://issues.apache.org/jira/browse/ARROW-2838
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


The {{PandasObjectIsNull}} helper function can be a significant contributor 
when converting a Pandas dataframe to Arrow format (e.g. when writing a 
dataframe to feather format). We can try to speed up the type checks in that 
function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2277) [Python] Tensor.from_numpy doesn't support struct arrays

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2277:
-

 Summary: [Python] Tensor.from_numpy doesn't support struct arrays
 Key: ARROW-2277
 URL: https://issues.apache.org/jira/browse/ARROW-2277
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> dt = np.dtype([('x', np.int8), ('y', np.float32)])
>>> dt.itemsize
5
>>> arr = np.arange(5*10, dtype=np.int8).view(dt)
>>> pa.Tensor.from_numpy(arr)
Traceback (most recent call last):
  File "", line 1, in 
pa.Tensor.from_numpy(arr)
  File "array.pxi", line 523, in pyarrow.lib.Tensor.from_numpy
  File "error.pxi", line 85, in pyarrow.lib.check_status
ArrowNotImplementedError: 
/home/antoine/arrow/cpp/src/arrow/python/numpy_convert.cc:250 code: 
GetTensorType(reinterpret_cast(PyArray_DESCR(ndarray)), )
Unsupported numpy type 20

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2278) [Python] deserializing Numpy struct arrays raises

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2278:
-

 Summary: [Python] deserializing Numpy struct arrays raises
 Key: ARROW-2278
 URL: https://issues.apache.org/jira/browse/ARROW-2278
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> import numpy as np
>>> dt = np.dtype([('x', np.int8), ('y', np.float32)])
>>> arr = np.arange(5*10, dtype=np.int8).view(dt)
>>> pa.deserialize(pa.serialize(arr).to_buffer())
Traceback (most recent call last):
  File "", line 1, in 
pa.deserialize(pa.serialize(arr).to_buffer())
  File "serialization.pxi", line 441, in pyarrow.lib.deserialize
  File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
  File "serialization.pxi", line 257, in 
pyarrow.lib.SerializedPyObject.deserialize
  File "serialization.pxi", line 174, in 
pyarrow.lib.SerializationContext._deserialize_callback
  File "/home/antoine/arrow/python/pyarrow/serialization.py", line 44, in 
_deserialize_numpy_array_list
return np.array(data[0], dtype=np.dtype(data[1]))
TypeError: a bytes-like object is required, not 'int'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2288) [Python] slicing logic defective

2018-03-08 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2288:
-

 Summary: [Python] slicing logic defective
 Key: ARROW-2288
 URL: https://issues.apache.org/jira/browse/ARROW-2288
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


The slicing logic tends to go too far when normalizing large negative bounds, 
which leads to results not in line with Python's slicing semantics:
{code}
>>> arr = pa.array([1,2,3,4])
>>> arr[-99:100]

[
  2,
  3,
  4
]
>>> arr.to_pylist()[-99:100]
[1, 2, 3, 4]
>>> 
>>> 
>>> arr[-6:-5]

[
  3
]
>>> arr.to_pylist()[-6:-5]
[]
{code}
Also note this crash:
{code}
>>> arr[10:13]
/home/antoine/arrow/cpp/src/arrow/array.cc:105 Check failed: (offset) <= 
(data.length) 
Abandon (core dumped)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2237) [Python] Huge tables test failure

2018-03-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2237:
-

 Summary: [Python] Huge tables test failure
 Key: ARROW-2237
 URL: https://issues.apache.org/jira/browse/ARROW-2237
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou


This is a new failure here (Ubuntu 16.04, x86-64):
{code}
_ test_use_huge_pages _
Traceback (most recent call last):
  File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 779, in 
test_use_huge_pages
create_object(plasma_client, 1)
  File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 80, in 
create_object
seal=seal)
  File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 69, in 
create_object_with_id
memory_buffer = client.create(object_id, data_size, metadata)
  File "plasma.pyx", line 302, in pyarrow.plasma.PlasmaClient.create
  File "error.pxi", line 79, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: /home/antoine/arrow/cpp/src/plasma/client.cc:192 
code: PlasmaReceive(store_conn_, MessageType_PlasmaCreateReply, )
/home/antoine/arrow/cpp/src/plasma/protocol.cc:46 code: ReadMessage(sock, 
, buffer)
Encountered unexpected EOF
 Captured stderr call -
Allowing the Plasma store to use up to 0.1GB of memory.
Starting object store with directory /mnt/hugepages and huge page support 
enabled
create_buffer failed to open file /mnt/hugepages/plasmapSNc0X
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2238) [C++] Detect clcache in cmake configuration

2018-03-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2238:
-

 Summary: [C++] Detect clcache in cmake configuration
 Key: ARROW-2238
 URL: https://issues.apache.org/jira/browse/ARROW-2238
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


By default Windows builds should use clcache if installed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2239) [C++] Update build docs for Windows

2018-03-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2239:
-

 Summary: [C++] Update build docs for Windows
 Key: ARROW-2239
 URL: https://issues.apache.org/jira/browse/ARROW-2239
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Documentation
Reporter: Antoine Pitrou
 Fix For: 0.9.0


We should update the C++ build docs for Windows to recommend use of Ninja and 
clcache for faster builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2311) [Python] Struct array slicing defective

2018-03-14 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2311:
-

 Summary: [Python] Struct array slicing defective
 Key: ARROW-2311
 URL: https://issues.apache.org/jira/browse/ARROW-2311
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


{code:python}
>>> arr = pa.array([(1, 2.0), (3, 4.0), (5, 6.0)], 
>>> type=pa.struct([pa.field('x', pa.int16()), pa.field('y', pa.float32())]))
>>> arr

[
  {'x': 1, 'y': 2.0},
  {'x': 3, 'y': 4.0},
  {'x': 5, 'y': 6.0}
]
>>> arr[1:]

[
  {'x': 1, 'y': 2.0},
  {'x': 3, 'y': 4.0}
]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2315) [C++/Python] Add method to flatten a struct array

2018-03-15 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2315:
-

 Summary: [C++/Python] Add method to flatten a struct array
 Key: ARROW-2315
 URL: https://issues.apache.org/jira/browse/ARROW-2315
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


See ARROW-1886. We want to be able to take a StructArray and flatten it into 
independent field arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2270) [Python] ForeignBuffer doesn't tie Python object lifetime to C++ buffer lifetime

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2270:
-

 Summary: [Python] ForeignBuffer doesn't tie Python object lifetime 
to C++ buffer lifetime
 Key: ARROW-2270
 URL: https://issues.apache.org/jira/browse/ARROW-2270
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


{{ForeignBuffer}} keeps the reference to the Python base object in the Python 
wrapper class, not in the C++ buffer instance, meaning if the C++ buffer gets 
passed around but the Python wrapper gets destroyed, the reference to the 
original Python base object will be released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2271) [Python] test_plasma could make errors more diagnosable

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2271:
-

 Summary: [Python] test_plasma could make errors more diagnosable
 Key: ARROW-2271
 URL: https://issues.apache.org/jira/browse/ARROW-2271
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Antoine Pitrou


Currently, when {{plasma_store}} fails for a reason or another, you get poorly 
readable errors from {{test_plasma.py}}. Displaying the child process' stderr 
would help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2287) [Python] chunked array not iterable, not indexable

2018-03-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2287:
-

 Summary: [Python] chunked array not iterable, not indexable
 Key: ARROW-2287
 URL: https://issues.apache.org/jira/browse/ARROW-2287
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


It would be useful to access individual elements of a chunked array either 
through iteration or indexing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2284) [Python] test_plasma error on plasma_store error

2018-03-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2284:
-

 Summary: [Python] test_plasma error on plasma_store error
 Key: ARROW-2284
 URL: https://issues.apache.org/jira/browse/ARROW-2284
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This appears caused by my latest changes:
{code:python}
Traceback (most recent call last):
  File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 192, in 
setup_method
    plasma_store_name, self.p = self.plasma_store_ctx.__enter__()
  File "/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/contextlib.py", 
line 81, in __enter__
    return next(self.gen)
  File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 168, in 
start_plasma_store
    err = proc.stderr.read().decode()
AttributeError: 'NoneType' object has no attribute 'read'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2286) [Python] Allow subscripting pyarrow.lib.StructValue

2018-03-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2286:
-

 Summary: [Python] Allow subscripting pyarrow.lib.StructValue
 Key: ARROW-2286
 URL: https://issues.apache.org/jira/browse/ARROW-2286
 Project: Apache Arrow
  Issue Type: Wish
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> obj
{'x': 42, 'y': True}
>>> type(obj)
pyarrow.lib.StructValue
>>> obj['x']
Traceback (most recent call last):
  File "", line 1, in 
    obj['x']
TypeError: 'pyarrow.lib.StructValue' object is not subscriptable
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2285) [Python] Can't convert Numpy string arrays

2018-03-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2285:
-

 Summary: [Python] Can't convert Numpy string arrays
 Key: ARROW-2285
 URL: https://issues.apache.org/jira/browse/ARROW-2285
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> arr = np.array([b'foo', b'bar'], dtype='S3')
>>> pa.array(arr, type=pa.binary(3))
Traceback (most recent call last):
  File "", line 1, in 
pa.array(arr, type=pa.binary(3))
  File "array.pxi", line 177, in pyarrow.lib.array
  File "array.pxi", line 77, in pyarrow.lib._ndarray_to_array
  File "error.pxi", line 85, in pyarrow.lib.check_status
ArrowNotImplementedError: 
/home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1661 code: 
converter.Convert()
NumPyConverter doesn't implement  conversion. 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2276) [Python] Tensor could implement the buffer protocol

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2276:
-

 Summary: [Python] Tensor could implement the buffer protocol
 Key: ARROW-2276
 URL: https://issues.apache.org/jira/browse/ARROW-2276
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


Tensors have an underlying buffer, a data type, shape and strides. It seems 
like they could implement the Python buffer protocol.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2275) [C++] Buffer::mutable_data_ member uninitialized

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2275:
-

 Summary: [C++] Buffer::mutable_data_ member uninitialized
 Key: ARROW-2275
 URL: https://issues.apache.org/jira/browse/ARROW-2275
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


For immutable buffers (i.e. most of them), the {{mutable_data_}} member is 
uninitialized. If the user calls {{mutable_data()}} by mistake on such a 
buffer, they will get a bogus pointer back.

This is exacerbated by the Tensor API whose const and non-const {{raw_data()}} 
methods return different things...

(also an idea: add a DCHECK for mutability before returning from 
{{mutable_data()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2309) [C++] Use std::make_unsigned

2018-03-14 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2309:
-

 Summary: [C++] Use std::make_unsigned
 Key: ARROW-2309
 URL: https://issues.apache.org/jira/browse/ARROW-2309
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


{{arrow/util/bit-util.h}} has a reimplementation of {{boost::make_unsigned}}, 
but we could simply use {{std::make_unsigned}}, which is C++11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2426) [CI] glib build failure

2018-04-09 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2426:
-

 Summary: [CI] glib build failure
 Key: ARROW-2426
 URL: https://issues.apache.org/jira/browse/ARROW-2426
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Antoine Pitrou


The glib build on Travis-CI fails:

[https://travis-ci.org/apache/arrow/jobs/364123364#L6840]

{code}
==> Installing gobject-introspection
==> Downloading 
https://homebrew.bintray.com/bottles/gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
==> Pouring gobject-introspection-1.56.0_1.sierra.bottle.tar.gz
  /usr/local/Cellar/gobject-introspection/1.56.0_1: 173 files, 9.8MB
Installing gobject-introspection has failed!
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2442) [C++] Disambiguate Builder::Append overloads

2018-04-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2442:
-

 Summary: [C++] Disambiguate Builder::Append overloads
 Key: ARROW-2442
 URL: https://issues.apache.org/jira/browse/ARROW-2442
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


See discussion in 
[https://github.com/apache/arrow/pull/1852#discussion_r179919627]

There are various {{Append()}} overloads in Builder and subclasses, some of 
which append one value, some of which append multiple values at once.

The API might be clearer and less error-prone if multiple-append variants were 
named differently, for example {{AppendValues()}}. Especially with the 
pointer-taking variants, it's probably easy to call the wrong overload by 
mistake.

The existing methods would have to go through a deprecation cycle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2400) [C++] Status destructor is expensive

2018-04-05 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2400:
-

 Summary: [C++] Status destructor is expensive
 Key: ARROW-2400
 URL: https://issues.apache.org/jira/browse/ARROW-2400
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Let's take the following micro-benchmark (in Python):

{code:bash}
$ python -m timeit -s "import pyarrow as pa; data = [b'xx' for i in 
range(1)]" "pa.array(data, type=pa.binary())"
1000 loops, best of 3: 784 usec per loop
{code}

If I replace the Status destructor with a no-op:
{code:c++}
  ~Status() { }
{code}

then the benchmark result becomes:
{code:bash}
$ python -m timeit -s "import pyarrow as pa; data = [b'xx' for i in 
range(1)]" "pa.array(data, type=pa.binary())"
1000 loops, best of 3: 561 usec per loop
{code}

This is almost a 30% win. I get similar results on the conversion benchmarks in 
the benchmark suite.

I'm unsure about the explanation. In the common case, {{delete _state}} should 
be extremely fast, since the state is NULL. Yet, it seems it adds significant 
overhead. Perhaps because of exception handling?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2389) [C++] Add StatusCode::OverflowError

2018-04-04 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2389:
-

 Summary: [C++] Add StatusCode::OverflowError
 Key: ARROW-2389
 URL: https://issues.apache.org/jira/browse/ARROW-2389
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


It may be useful to have a {{StatusCode::OverflowError}} return code, to signal 
that something overflowed allowed limits (e.g. the 2GB limit for string or 
binary values).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2390) [C++/Python] CheckPyError() could inspect exception type

2018-04-04 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2390:
-

 Summary: [C++/Python] CheckPyError() could inspect exception type
 Key: ARROW-2390
 URL: https://issues.apache.org/jira/browse/ARROW-2390
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Current {{CheckPyError}} always chooses an "unknown error" status. But it could 
inspect the Python exception and choose, e.g. "type error" for a {{TypeError}} 
exception, etc.

See also ARROW-2389



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2402) [C++] FixedSizeBinaryBuilder::Append lacks "const char*" overload

2018-04-05 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2402:
-

 Summary: [C++] FixedSizeBinaryBuilder::Append lacks "const char*" 
overload
 Key: ARROW-2402
 URL: https://issues.apache.org/jira/browse/ARROW-2402
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


This implies that calling {{FixedSizeBinaryBuilder::Append}} with a "const 
char*" argument currently instantiates a temporary {{std::string}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   >