date:20191002

[jira] [Resolved] (ARROW-6777) [GLib][CI] Unpin gobject-introspection gem

2019-10-02 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-6777.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5572
[https://github.com/apache/arrow/pull/5572]

> [GLib][CI] Unpin gobject-introspection gem
> --
>
> Key: ARROW-6777
> URL: https://issues.apache.org/jira/browse/ARROW-6777
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6777) [GLib][CI] Unpin gobject-introspection gem

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6777:
--
Labels: pull-request-available  (was: )

> [GLib][CI] Unpin gobject-introspection gem
> --
>
> Key: ARROW-6777
> URL: https://issues.apache.org/jira/browse/ARROW-6777
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6777) [GLib][CI] Unpin gobject-introspection gem

2019-10-02 Thread Kouhei Sutou (Jira)

Kouhei Sutou created ARROW-6777:
---

 Summary: [GLib][CI] Unpin gobject-introspection gem
 Key: ARROW-6777
 URL: https://issues.apache.org/jira/browse/ARROW-6777
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6774) [Rust] Reading parquet file is slow

2019-10-02 Thread Micah Kornfield (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-6774:
---
Summary: [Rust] Reading parquet file is slow  (was: Reading parquet file is 
slow)

> [Rust] Reading parquet file is slow
> ---
>
> Key: ARROW-6774
> URL: https://issues.apache.org/jira/browse/ARROW-6774
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.15.0
>Reporter: Adam Lippai
>Priority: Major
>
> Using the example at 
> [https://github.com/apache/arrow/tree/master/rust/parquet] is slow.
> The snippet 
> {code:none}
> let reader = SerializedFileReader::new(file).unwrap();
> let mut iter = reader.get_row_iter(None).unwrap();
> let start = Instant::now();
> while let Some(record) = iter.next() {}
> let duration = start.elapsed();
> println!("{:?}", duration);
> {code}
> Runs for 17sec for a ~160MB parquet file.
> If there is a more effective way to load a parquet file, it would be nice to 
> add it to the readme.
> P.S.: My goal is to construct an ndarray from it, I'd be happy for any tips.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6760) [C++] JSON: improve error message when column changed type

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6760:
--
Labels: pull-request-available  (was: )

> [C++] JSON: improve error message when column changed type
> --
>
> Key: ARROW-6760
> URL: https://issues.apache.org/jira/browse/ARROW-6760
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Attachments: dummy.jl
>
>
> When a column accidentally changes type in a JSON file (which is not 
> supported), it would be nice to get the column name that gives this problem 
> in the error message.
> ---
> I am trying to parse a simple json file. While doing so, am getting the error 
>  {{JSON parse error: A column changed from string to number}}
> {code}
> from pyarrow import json
> r = json.read_json('dummy.jl')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6776) [Python] Need a lite version of pyarrow

2019-10-02 Thread Wes McKinney (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943279#comment-16943279
 ] 

Wes McKinney commented on ARROW-6776:
-

Our wheel build scripts are found here

https://github.com/apache/arrow/tree/master/python/manylinux1

It's easy to build your own wheels, just follow the README.

We are shipping many optional components that you can turn off and make smaller 
wheels.

The duplicated shared library issue is 
https://issues.apache.org/jira/browse/ARROW-5082. You are welcome to try to 
resolve this. I and my team have decided to not spend time on wheel-related 
issues anymore, but other Arrow community members are welcome to do what they 
wish

> [Python] Need a lite version of pyarrow
> ---
>
> Key: ARROW-6776
> URL: https://issues.apache.org/jira/browse/ARROW-6776
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.14.1
>Reporter: Haowei Yu
>Priority: Major
>
> Currently I am building a library packages on top of pyarrow, so I include 
> pyarrow as a dependency and ship it to our customer. However, when our 
> customer installed our packages, it will also install pyarrow and pyarrow's 
> dependency (numpy). However the dependency size is huge. 
> {code:bash}
> (py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M 
> /home/hyu/py36env/lib/python3.6/site-packages/pyarrow/ 
> total 186M
> {code}
>  And numpy is around 80MB. Total is more than 250 MB.
> Our customer want to bundle all dependency and run the code inside AWS 
> Lambda, however they hit the size limit and failed to run the code.
> Looking into the pyarrow, I saw multiple .so files are shipped both with and 
> without version suffix, I wonder if you can remove the one of them (either 
> with or without suffix), it will at least reduce the package size by half.
> Further, our library just want to use IPC and read data as record batch, I 
> don't need arrow flight at all (which is the biggest .so file and takes 
> around 100 MB). I wonder if you can push a lite version of the pyarrow so 
> that I can specify lite version as the dependency. Or maybe I need to build 
> my own lite version and push it pypi. However, this approach cause further 
> problem if our customer is using the "fat" version of pyarrow unless you the 
> change the namespace of lite version of pyarrow.
> Another alternative is that I bundle the pyarrow with our library ( copy the 
> whole directory into vendored namespace) and ship it to our customer without 
> specifying pyarrow as a dependency. The advantage of this one is that I can 
> build pyarrow with whatever option/sub-module/libraries I need. However, I 
> tried a lot but failed because pyarrow use absolute import and it will fail 
> to import the script in the new location. 
> Any insight how I should resolve this issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6776) [Python] Need a lite version of pyarrow

2019-10-02 Thread Haowei Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haowei Yu updated ARROW-6776:
-
Description: 
Currently I am building a library packages on top of pyarrow, so I include 
pyarrow as a dependency and ship it to our customer. However, when our customer 
installed our packages, it will also install pyarrow and pyarrow's dependency 
(numpy). However the dependency size is huge. 
{code:bash}
(py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M 
/home/hyu/py36env/lib/python3.6/site-packages/pyarrow/ 
total 186M
{code}
 And numpy is around 80MB. Total is more than 250 MB.

Our customer want to bundle all dependency and run the code inside AWS Lambda, 
however they hit the size limit and failed to run the code.

Looking into the pyarrow, I saw multiple .so files are shipped both with and 
without version suffix, I wonder if you can remove the one of them (either with 
or without suffix), it will at least reduce the package size by half.

Further, our library just want to use IPC and read data as record batch, I 
don't need arrow flight at all (which is the biggest .so file and takes around 
100 MB). I wonder if you can push a lite version of the pyarrow so that I can 
specify lite version as the dependency. Or maybe I need to build my own lite 
version and push it pypi. However, this approach cause further problem if our 
customer is using the "fat" version of pyarrow unless you the change the 
namespace of lite version of pyarrow.

Another alternative is that I bundle the pyarrow with our library ( copy the 
whole directory into vendored namespace) and ship it to our customer without 
specifying pyarrow as a dependency. The advantage of this one is that I can 
build pyarrow with whatever option/sub-module/libraries I need. However, I 
tried a lot but failed because pyarrow use absolute import and it will fail to 
import the script in the new location. 

Any insight how I should resolve this issue?

  was:
Currently I am building a library packages on top of pyarrow, so I include 
pyarrow as a dependency and ship it to our customer. However, when our customer 
installed our packages, it will also install pyarrow and pyarrow's dependency 
(numpy). However the dependency size is huge. 
{code:bash}
(py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M 
/home/hyu/py36env/lib/python3.6/site-packages/pyarrow/ 
total 186M
{code}
 And numpy is around 80MB. Total is more than 250 MB.

Our customer want to bundle all dependency and run the code inside AWS Lambda, 
however they hit the size limit and failed to run the code.

Looking into the pyarrow, I saw multiple .so files are shipped both with and 
without version suffix, I wonder if you can remove the one of them (either with 
or without suffix), it will at least reduce the package size by half.

Further, our library just want to use IPC and read data as record batch, I 
don't need arrow flight at all (which is the biggest .so file and takes around 
100 MB). I wonder if you can push a lite version of the pyarrow so that I can 
specify lite version as the dependency. Or maybe I need to build my own lite 
version and push it pypi. However, this approach cause further problem if our 
customer is using the "fat" version of pyarrow unless you the change the 
namespace of lite version of pyarrow.

Another alternative is that I bundle the pyarrow with our library ( copy the 
whole directory into vendored namespace) and ship it to our customer without 
specifying pyarrow as a dependency. The advantage of this one is that I can 
build pyarrow with whatever option/sub-module/libraries I need. However, I 
tried a lot but failed because pyarrow use absolute import and it will fail to 
import the script in the new location. 

Any insight how I should resolve this issue?

 

 

 

 


> [Python] Need a lite version of pyarrow
> ---
>
> Key: ARROW-6776
> URL: https://issues.apache.org/jira/browse/ARROW-6776
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.14.1
>Reporter: Haowei Yu
>Priority: Major
>
> Currently I am building a library packages on top of pyarrow, so I include 
> pyarrow as a dependency and ship it to our customer. However, when our 
> customer installed our packages, it will also install pyarrow and pyarrow's 
> dependency (numpy). However the dependency size is huge. 
> {code:bash}
> (py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M 
> /home/hyu/py36env/lib/python3.6/site-packages/pyarrow/ 
> total 186M
> {code}
>  And numpy is around 80MB. Total is more than 250 MB.
> Our customer want to bundle all dependency and run the code inside AWS 
> Lambda, however they hit the size limit and failed to run the code.
>

[jira] [Created] (ARROW-6776) [Python] Need a lite version of pyarrow

2019-10-02 Thread Haowei Yu (Jira)

Haowei Yu created ARROW-6776:


 Summary: [Python] Need a lite version of pyarrow
 Key: ARROW-6776
 URL: https://issues.apache.org/jira/browse/ARROW-6776
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.14.1
Reporter: Haowei Yu


Currently I am building a library packages on top of pyarrow, so I include 
pyarrow as a dependency and ship it to our customer. However, when our customer 
installed our packages, it will also install pyarrow and pyarrow's dependency 
(numpy). However the dependency size is huge. 
{code:bash}
(py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M 
/home/hyu/py36env/lib/python3.6/site-packages/pyarrow/ 
total 186M
{code}
 And numpy is around 80MB. Total is more than 250 MB.

Our customer want to bundle all dependency and run the code inside AWS Lambda, 
however they hit the size limit and failed to run the code.

Looking into the pyarrow, I saw multiple .so files are shipped both with and 
without version suffix, I wonder if you can remove the one of them (either with 
or without suffix), it will at least reduce the package size by half.

Further, our library just want to use IPC and read data as record batch, I 
don't need arrow flight at all (which is the biggest .so file and takes around 
100 MB). I wonder if you can push a lite version of the pyarrow so that I can 
specify lite version as the dependency. Or maybe I need to build my own lite 
version and push it pypi. However, this approach cause further problem if our 
customer is using the "fat" version of pyarrow unless you the change the 
namespace of lite version of pyarrow.

Another alternative is that I bundle the pyarrow with our library ( copy the 
whole directory into vendored namespace) and ship it to our customer without 
specifying pyarrow as a dependency. The advantage of this one is that I can 
build pyarrow with whatever option/sub-module/libraries I need. However, I 
tried a lot but failed because pyarrow use absolute import and it will fail to 
import the script in the new location. 

Any insight how I should resolve this issue?

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6761) [Rust] Travis CI builds not respecting rust-toolchain

2019-10-02 Thread Paddy Horan (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6761.

Resolution: Fixed

Issue resolved by pull request 5561
[https://github.com/apache/arrow/pull/5561]

> [Rust] Travis CI builds not respecting rust-toolchain
> -
>
> Key: ARROW-6761
> URL: https://issues.apache.org/jira/browse/ARROW-6761
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Travis builds recently started failing with a Rust ICE (Internal Compiler 
> Error) which has been reported to the Rust compiler team 
> ([https://github.com/rust-lang/rust/issues/64908]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6775) Proposal for several Array utility functions

2019-10-02 Thread Zhuo Peng (Jira)

Zhuo Peng created ARROW-6775:


 Summary: Proposal for several Array utility functions
 Key: ARROW-6775
 URL: https://issues.apache.org/jira/browse/ARROW-6775
 Project: Apache Arrow
  Issue Type: Wish
Reporter: Zhuo Peng


Hi,

We developed several utilities that computes / accesses certain properties of 
Arrays and wonder if they make sense to get them into the upstream (into both 
the C++ API and pyarrow) and assuming yes, where is the best place to put them?

Maybe I have overlooked existing APIs that already do the same.. in that case 
please point out.

 

1/ ListLengthFromListArray(ListArray&)

Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for 
large lists). For example:

[[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned 
array can be converted to numpy)

 

2/ GetBinaryArrayTotalByteSize(BinaryArray&)

Returns the total byte size of a BinaryArray (basically offset[len - 1] - 
offset[0]).

Alternatively, a BinaryArray::Flatten() -> Uint8Array would work.

 

3/ GetArrayNullBitmapAsByteArray(Array&)

Returns the array's null bitmap as a UInt8Array (which can be efficiently 
converted to a bool numpy array)

 

4/ GetFlattenedArrayParentIndices(ListArray&)

Makes a int32 array of the same length as the flattened ListArray. 
returned_array[i] == j means i-th element in the flattened ListArray came from 
j-th list in the ListArray.


For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6774) Reading parquet file is slow

2019-10-02 Thread Adam Lippai (Jira)

Adam Lippai created ARROW-6774:
--

 Summary: Reading parquet file is slow
 Key: ARROW-6774
 URL: https://issues.apache.org/jira/browse/ARROW-6774
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.15.0
Reporter: Adam Lippai


Using the example at [https://github.com/apache/arrow/tree/master/rust/parquet] 
is slow.

The snippet 
{code:none}
let reader = SerializedFileReader::new(file).unwrap();
let mut iter = reader.get_row_iter(None).unwrap();
let start = Instant::now();
while let Some(record) = iter.next() {}
let duration = start.elapsed();
println!("{:?}", duration);
{code}
Runs for 17sec for a ~160MB parquet file.

If there is a more effective way to load a parquet file, it would be nice to 
add it to the readme.

P.S.: My goal is to construct an ndarray from it, I'd be happy for any tips.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2019-10-02 Thread Tarek Allam (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarek Allam updated ARROW-6766:
---
Description: 
{{After following the instructions found on the developer guides for Python, I 
was}}
 {{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
 {{# $ git clone g...@github.com:apache/arrow.git}}
 # $ conda create -y -n pyarrow-dev -c conda-forge 
 #   --file arrow/ci/conda_env_unix.yml 
 #   --file arrow/ci/conda_env_cpp.yml 
 #   --file arrow/ci/conda_env_python.yml 
 #    compilers 
 {{#  python=3.7}}
 {{# $ conda activate pyarrow-dev}}
 {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
 {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
 {{export CXX=`which clang++`}}{\{mkdir arrow/cpp/build }}
     pushd arrow/cpp/build \
     cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
     -DCMAKE_INSTALL_LIBDIR=lib \
     -DARROW_FLIGHT=OFF \
     -DARROW_GANDIVA=OFF \
     -DARROW_ORC=ON \
     -DARROW_PARQUET=ON \
     -DARROW_PYTHON=ON \
     -DARROW_PLASMA=ON \
     -DARROW_BUILD_TESTS=ON \
    ..
 {{make -j4}}
 {{make install}}
 {{popd}}

But when I run:

{{pushd arrow/python}}
 {{export PYARROW_WITH_FLIGHT=0}}
 {{export PYARROW_WITH_GANDIVA=0}}
 {{export PYARROW_WITH_ORC=1}}
 {{export PYARROW_WITH_PARQUET=1}}
 {{python setup.py build_ext --inplace}}
 {{popd}}

I get the following errors:

{{-- Build output directory: 
/Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
 {{-- Found the Arrow core library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
 {{-- Found the Arrow Python library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
 {{CMake Error: File /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib 
does not exist.}}{{...}}{{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
 {{CMake Error at CMakeLists.txt:230 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
 {{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
 {{CMake Error at CMakeLists.txt:226 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
 {{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
 {{CMake Error at CMakeLists.txt:230 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}

 

What is quite strange is that the libraries seem to indeed be there but they
 have an addition component such as `libarrow.15.dylib` .e.g:

{{$ ls -l libarrow_python.15.dylib && echo $PWD}}
 {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
 {{libarrow_python.15.0.0.dylib}}
 {{/Users/tallamjr/github/arrow/dist/lib}}

I guess I am not exactly sure what the issue here is but it appears to be that
 the version is not captured as a variable that is used by CMAKE? I have run the
 same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
 which both seem to produce same errors.

Apologies if this is not quite the format for JIRA issues here or perhaps if
 it's not the correct platform for this, I'm very new to the project and
 contributing to apache in general. Thanks

 

  was:
{{After following the instructions found on the developer guides for Python, I 
was}}
 {{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
 {{# $ git clone g...@github.com:apache/arrow.git}}
 # $ conda create -y -n pyarrow-dev -c conda-forge 
 #   --file arrow/ci/conda_env_unix.yml 
 #   --file arrow/ci/conda_env_cpp.yml 
 #   --file arrow/ci/conda_env_python.yml 
 #    compilers 
 {{#  python=3.7}}
 {{# $ conda activate pyarrow-dev}}
 {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
 {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
 {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build \}}
    pushd arrow/cpp/build \
     cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
     -DCMAKE_INSTALL_LIBDIR=lib \
     -DARROW_FLIGHT=OFF \
     -DARROW_GANDIVA=OFF \
     -DARROW_ORC=ON \
     -DARROW_PARQUET=ON \
     -DARROW_PYTHON=ON \
     -DARROW_PLASMA=ON \
     -DARROW_BUILD_TESTS=ON \
    ..
 {{make -j4}}
 {{make install}}
 {{popd}}

But when I run:

{{pushd arrow/python}}
 {{export PYARROW_WITH_FLIGHT=1}}
 {{export PYARROW_WITH_GANDIVA=1}}
 {{export PYARROW_WITH_ORC=1}}
 {{export PYARROW_WITH_PARQUET=1}}
 {{python setup.py build_ext --inplace}}
 {{popd}}

I get the following errors:

{{-- Build output

[jira] [Assigned] (ARROW-6773) [C++] Filter kernel returns invalid data when filtering with an Array slice

2019-10-02 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-6773:
--

Assignee: Neal Richardson  (was: Ben Kietzman)

> [C++] Filter kernel returns invalid data when filtering with an Array slice
> ---
>
> Key: ARROW-6773
> URL: https://issues.apache.org/jira/browse/ARROW-6773
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> See ARROW-3808. This failing test reproduces the issue:
> {code:java}
> --- a/cpp/src/arrow/compute/kernels/filter_test.cc
> +++ b/cpp/src/arrow/compute/kernels/filter_test.cc
> @@ -151,6 +151,12 @@ TYPED_TEST(TestFilterKernelWithNumeric, FilterNumeric) {
>this->AssertFilter("[7, 8, 9]", "[null, 1, 0]", "[null, 8]");
>this->AssertFilter("[7, 8, 9]", "[1, null, 1]", "[7, null, 9]");
>  
> +  this->AssertFilterArrays(
> +ArrayFromJSON(this->type_singleton(), "[7, 8, 9]"),
> +ArrayFromJSON(boolean(), "[0, 1, 1, 1, 0, 1]")->Slice(3, 3),
> +ArrayFromJSON(this->type_singleton(), "[7, 9]")
> +  );
> +
> {code}
> {code:java}
> arrow/cpp/src/arrow/testing/gtest_util.cc:82: Failure
> Failed
> @@ -2, +2 @@
> +0
> [  FAILED  ] TestFilterKernelWithNumeric/9.FilterNumeric, where TypeParam = 
> arrow::DoubleType (0 ms)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2019-10-02 Thread Wes McKinney (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943073#comment-16943073
 ] 

Wes McKinney commented on ARROW-6766:
-

Looks like your build isn't picking up the library SO/ABI version correctly. 
I'm not sure what's wrong, someone else may have an idea

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Major
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
>  {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
>  {{# $ git clone g...@github.com:apache/arrow.git}}
>  # $ conda create -y -n pyarrow-dev -c conda-forge 
>  #   --file arrow/ci/conda_env_unix.yml 
>  #   --file arrow/ci/conda_env_cpp.yml 
>  #   --file arrow/ci/conda_env_python.yml 
>  #    compilers 
>  {{#  python=3.7}}
>  {{# $ conda activate pyarrow-dev}}
>  {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
>  {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
>  {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build \}}
>     pushd arrow/cpp/build \
>      cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>      -DCMAKE_INSTALL_LIBDIR=lib \
>      -DARROW_FLIGHT=OFF \
>      -DARROW_GANDIVA=OFF \
>      -DARROW_ORC=ON \
>      -DARROW_PARQUET=ON \
>      -DARROW_PYTHON=ON \
>      -DARROW_PLASMA=ON \
>      -DARROW_BUILD_TESTS=ON \
>     ..
>  {{make -j4}}
>  {{make install}}
>  {{popd}}
> But when I run:
> {{pushd arrow/python}}
>  {{export PYARROW_WITH_FLIGHT=1}}
>  {{export PYARROW_WITH_GANDIVA=1}}
>  {{export PYARROW_WITH_ORC=1}}
>  {{export PYARROW_WITH_PARQUET=1}}
>  {{python setup.py build_ext --inplace}}
>  {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
>  {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
>  {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not 
> exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:226 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
>  have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
>  {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
>  {{libarrow_python.15.0.0.dylib}}
>  {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
>  the version is not captured as a variable that is used by CMAKE? I have run 
> the
>  same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
>  which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
>  it's not the correct platform for this, I'm very new to the project and
>  contributing to apache in general. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6773) [C++] Filter kernel returns invalid data when filtering with an Array slice

2019-10-02 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-6773:
--

 Summary: [C++] Filter kernel returns invalid data when filtering 
with an Array slice
 Key: ARROW-6773
 URL: https://issues.apache.org/jira/browse/ARROW-6773
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson
Assignee: Ben Kietzman
 Fix For: 1.0.0


See ARROW-3808. This failing test reproduces the issue:
{code:java}
--- a/cpp/src/arrow/compute/kernels/filter_test.cc
+++ b/cpp/src/arrow/compute/kernels/filter_test.cc
@@ -151,6 +151,12 @@ TYPED_TEST(TestFilterKernelWithNumeric, FilterNumeric) {
   this->AssertFilter("[7, 8, 9]", "[null, 1, 0]", "[null, 8]");
   this->AssertFilter("[7, 8, 9]", "[1, null, 1]", "[7, null, 9]");
 
+  this->AssertFilterArrays(
+ArrayFromJSON(this->type_singleton(), "[7, 8, 9]"),
+ArrayFromJSON(boolean(), "[0, 1, 1, 1, 0, 1]")->Slice(3, 3),
+ArrayFromJSON(this->type_singleton(), "[7, 9]")
+  );
+
{code}
{code:java}
arrow/cpp/src/arrow/testing/gtest_util.cc:82: Failure
Failed

@@ -2, +2 @@
+0

[  FAILED  ] TestFilterKernelWithNumeric/9.FilterNumeric, where TypeParam = 
arrow::DoubleType (0 ms)
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2019-10-02 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6766:

Priority: Major  (was: Blocker)

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Major
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
>  {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
>  {{# $ git clone g...@github.com:apache/arrow.git}}
>  # $ conda create -y -n pyarrow-dev -c conda-forge 
>  #   --file arrow/ci/conda_env_unix.yml 
>  #   --file arrow/ci/conda_env_cpp.yml 
>  #   --file arrow/ci/conda_env_python.yml 
>  #    compilers 
>  {{#  python=3.7}}
>  {{# $ conda activate pyarrow-dev}}
>  {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
>  {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
>  {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build \}}
>     pushd arrow/cpp/build \
>      cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>      -DCMAKE_INSTALL_LIBDIR=lib \
>      -DARROW_FLIGHT=OFF \
>      -DARROW_GANDIVA=OFF \
>      -DARROW_ORC=ON \
>      -DARROW_PARQUET=ON \
>      -DARROW_PYTHON=ON \
>      -DARROW_PLASMA=ON \
>      -DARROW_BUILD_TESTS=ON \
>     ..
>  {{make -j4}}
>  {{make install}}
>  {{popd}}
> But when I run:
> {{pushd arrow/python}}
>  {{export PYARROW_WITH_FLIGHT=1}}
>  {{export PYARROW_WITH_GANDIVA=1}}
>  {{export PYARROW_WITH_ORC=1}}
>  {{export PYARROW_WITH_PARQUET=1}}
>  {{python setup.py build_ext --inplace}}
>  {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
>  {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
>  {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not 
> exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:226 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
>  {{CMake Error at CMakeLists.txt:230 (configure_file):}}
>  \{{ configure_file Problem configuring file}}
>  {{Call Stack (most recent call first):}}
>  \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
>  have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
>  {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
>  {{libarrow_python.15.0.0.dylib}}
>  {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
>  the version is not captured as a variable that is used by CMAKE? I have run 
> the
>  same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
>  which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
>  it's not the correct platform for this, I'm very new to the project and
>  contributing to apache in general. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6581) [C++] Fix fuzzit job submission

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6581:
---
Summary: [C++] Fix fuzzit job submission  (was: [C++] Fuzzing job broken)

> [C++] Fix fuzzit job submission
> ---
>
> Key: ARROW-6581
> URL: https://issues.apache.org/jira/browse/ARROW-6581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> See [https://circleci.com/gh/ursa-labs/crossbow/2978]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6581) [C++] Fuzzing job broken

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-6581.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5407
[https://github.com/apache/arrow/pull/5407]

> [C++] Fuzzing job broken
> 
>
> Key: ARROW-6581
> URL: https://issues.apache.org/jira/browse/ARROW-6581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> See [https://circleci.com/gh/ursa-labs/crossbow/2978]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6581) [C++] Fuzzing job broken

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-6581:
--

Assignee: Antoine Pitrou

> [C++] Fuzzing job broken
> 
>
> Key: ARROW-6581
> URL: https://issues.apache.org/jira/browse/ARROW-6581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> See [https://circleci.com/gh/ursa-labs/crossbow/2978]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6759) [JS] Run less comprehensive every-commit build, relegate multi-target builds perhaps to nightlies

2019-10-02 Thread Paul Taylor (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943014#comment-16943014
 ] 

Paul Taylor commented on ARROW-6759:


Yeah no sweat, we can change the `ci/travis_script_js.sh` build and test 
commands to only test the UMD builds. Historically these have the most issues 
since they're minified, so if they pass everything should pass:

{code:bash}
npm run build -- -m umd -t es5 -t es2015 -t esnext
npm test -- -m umd -t es5 -t es2015 -t esnext
{code}


> [JS] Run less comprehensive every-commit build, relegate multi-target builds 
> perhaps to nightlies
> -
>
> Key: ARROW-6759
> URL: https://issues.apache.org/jira/browse/ARROW-6759
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> The JavaScript CI build is taking 25-30 minutes nowadays. This could be 
> abbreviated by testing fewer deployment targets. We obviously still need to 
> test all the deployment targets but we could do that nightly instead of on 
> every commit



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6772) [C++] Add operator== for interfaces with an Equals() method

2019-10-02 Thread Ben Kietzman (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943006#comment-16943006
 ] 

Ben Kietzman commented on ARROW-6772:
-

{{operator==}} for Schemas was added by 
[https://github.com/apache/arrow/pull/5529]

> [C++] Add operator== for interfaces with an Equals() method
> ---
>
> Key: ARROW-6772
> URL: https://issues.apache.org/jira/browse/ARROW-6772
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>
> A common pattern in tests is {{ASSERT_TRUE(schm->Equals(*other)}}. The 
> addition of overloaded equality operators will allow this o be written 
> {{ASSERT_EQ(*schm, *other)}}, which is more idiomatic GTEST usage and will 
> allow more informative assertion failure messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6756) [C++][Python] Include HDFS `getfacl` in `pyarrow.hdfs.HadoopFileSystem`

2019-10-02 Thread bb (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943003#comment-16943003
 ] 

bb commented on ARROW-6756:
---

All the libhdfs docs i found said:
{quote}The libhdfs APIs are a subset of the [Hadoop FileSystem 
APIs|https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html].
{quote}
And goes on to say that:
{quote}The header file for libhdfs describes each API in detail and is 
available in {{$HADOOP_PREFIX/src/c++/libhdfs/hdfs.h}}
{quote}
I grep'd hdfs.h and didnt see any explicit "acl" reference nor could I find it 
in the hadoop github repo but it's possible that the API is referenced under 
another name or I am looking in the wrong places?

> [C++][Python] Include HDFS `getfacl` in `pyarrow.hdfs.HadoopFileSystem`
> ---
>
> Key: ARROW-6756
> URL: https://issues.apache.org/jira/browse/ARROW-6756
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: bb
>Priority: Major
>  Labels: arrow, hdfs, pyarrow
>
> Extended HDFS filesystem attributes are exposed through the `getfacl` command.
> It would be immensely help to have this information accessible via:
> {code:java}
> pyarrow.hdfs.HadoopFileSystem{code}
>  
> Link to the official Hadoop docs where this is discussed in more detail:
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#getfacl]
> Sample output from the *nix shell:
> {code:java}
> $ hadoop fs -getfacl /path/to/hdfs/dir
>  # file: /path/to/hdfs/dir
>  # owner: hive
>  # group: hive
>  user::rwx
>  group:unix_group_with_acl_privs_defined:rwx
>  group::---
>  user:hive:rwx
>  group:hive:rwx
>  mask::rwx
>  other::--x{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6614) [C++][Dataset] Implement FileSystemDataSourceDiscovery

2019-10-02 Thread Benjamin Kietzman (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kietzman resolved ARROW-6614.
--
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 5529
[https://github.com/apache/arrow/pull/5529]

> [C++][Dataset] Implement FileSystemDataSourceDiscovery
> --
>
> Key: ARROW-6614
> URL: https://issues.apache.org/jira/browse/ARROW-6614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> DataSourceDiscovery is what allows InferingSchema and constructing a 
> DataSource with PartitionScheme.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6614) [C++][Dataset] Implement FileSystemDataSourceDiscovery

2019-10-02 Thread Benjamin Kietzman (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kietzman reassigned ARROW-6614:


Assignee: Francois Saint-Jacques

> [C++][Dataset] Implement FileSystemDataSourceDiscovery
> --
>
> Key: ARROW-6614
> URL: https://issues.apache.org/jira/browse/ARROW-6614
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> DataSourceDiscovery is what allows InferingSchema and constructing a 
> DataSource with PartitionScheme.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6772) [C++] Add operator== for interfaces with an Equals() method

2019-10-02 Thread Ben Kietzman (Jira)

Ben Kietzman created ARROW-6772:
---

 Summary: [C++] Add operator== for interfaces with an Equals() 
method
 Key: ARROW-6772
 URL: https://issues.apache.org/jira/browse/ARROW-6772
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ben Kietzman
Assignee: Ben Kietzman


A common pattern in tests is {{ASSERT_TRUE(schm->Equals(*other)}}. The addition 
of overloaded equality operators will allow this o be written 
{{ASSERT_EQ(*schm, *other)}}, which is more idiomatic GTEST usage and will 
allow more informative assertion failure messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6771) [Packaging][Python] Missing pytest dependency from conda and wheel builds

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6771:
--
Labels: pull-request-available  (was: )

> [Packaging][Python] Missing pytest dependency from conda and wheel builds
> -
>
> Key: ARROW-6771
> URL: https://issues.apache.org/jira/browse/ARROW-6771
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Multiple python packaging nightlies are failing:
> {code}
> Failed Tasks:
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py37
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-win-vs2015-py36
> - wheel-manylinux1-cp27mu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-manylinux1-cp27mu
> - conda-linux-gcc-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py27
> - wheel-osx-cp27m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-osx-cp27m
> - docker-spark-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-circle-docker-spark-integration
> - wheel-win-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp35m
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-win-vs2015-py37
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py37
> - wheel-manylinux2010-cp27mu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-manylinux2010-cp27mu
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py36
> - wheel-win-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp37m
> - wheel-win-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp36m
> - gandiva-jar-osx:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-gandiva-jar-osx
> - conda-osx-clang-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py27
> {code}
> Because of missing, recently introduced pytest-lazy-fixture test dependency:
> {code}
> + pytest -m 'not requires_testing_data' --pyargs pyarrow
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.2.0, py-1.8.0, pluggy-0.13.0
> hypothesis profile 'default' ->
> database=DirectoryBasedExampleDatabase('$SRC_DIR/.hypothesis/examples')
> rootdir: $SRC_DIR
> plugins: hypothesis-4.38.1
> collected 1437 items / 1 errors / 3 deselected / 5 skipped / 1428 selected
>  ERRORS 
> 
> __ ERROR collecting tests/test_fs.py 
> ___
> ../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/python3.7/site-packages/pyarrow/tests/test_fs.py:91:
> in 
> pytest.lazy_fixture('localfs'),
> E AttributeError: module 'pytest' has no attribute 'lazy_fixture'
> === warnings summary 
> ===
> $PREFIX/lib/python3.7/site-packages/_pytest/mark/structures.py:324
> $PREFIX/lib/python3.7/site-packages/_pytest/mark/structures.py:324:
> PytestUnknownMarkWarning: Unknown pytest.mark.s3 - is this a typo? You
> can register custom marks to avoid this warning - for details, see
> https://docs.pytest.org/en/latest/mark.html
> PytestUnknownMarkWarning,
> -- Docs: https://docs.pytest.org/en/latest/warnings.html
> !!! Interrupted: 1 errors during collection 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6771) [Packaging][Python] Missing pytest dependency from conda and wheel builds

2019-10-02 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-6771:
--

 Summary: [Packaging][Python] Missing pytest dependency from conda 
and wheel builds
 Key: ARROW-6771
 URL: https://issues.apache.org/jira/browse/ARROW-6771
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 1.0.0


Multiple python packaging nightlies are failing:

{code}
Failed Tasks:
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py37
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-win-vs2015-py36
- wheel-manylinux1-cp27mu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-manylinux1-cp27mu
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py27
- wheel-osx-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-osx-cp27m
- docker-spark-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-circle-docker-spark-integration
- wheel-win-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp35m
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-win-vs2015-py37
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py37
- wheel-manylinux2010-cp27mu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-manylinux2010-cp27mu
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py36
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp37m
- wheel-win-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp36m
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-gandiva-jar-osx
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py27
{code}

Because of missing, recently introduced pytest-lazy-fixture test dependency:
{code}
+ pytest -m 'not requires_testing_data' --pyargs pyarrow
= test session starts ==
platform linux -- Python 3.7.3, pytest-5.2.0, py-1.8.0, pluggy-0.13.0
hypothesis profile 'default' ->
database=DirectoryBasedExampleDatabase('$SRC_DIR/.hypothesis/examples')
rootdir: $SRC_DIR
plugins: hypothesis-4.38.1
collected 1437 items / 1 errors / 3 deselected / 5 skipped / 1428 selected

 ERRORS 
__ ERROR collecting tests/test_fs.py ___
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/python3.7/site-packages/pyarrow/tests/test_fs.py:91:
in 
pytest.lazy_fixture('localfs'),
E AttributeError: module 'pytest' has no attribute 'lazy_fixture'
=== warnings summary ===
$PREFIX/lib/python3.7/site-packages/_pytest/mark/structures.py:324
$PREFIX/lib/python3.7/site-packages/_pytest/mark/structures.py:324:
PytestUnknownMarkWarning: Unknown pytest.mark.s3 - is this a typo? You
can register custom marks to avoid this warning - for details, see
https://docs.pytest.org/en/latest/mark.html
PytestUnknownMarkWarning,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!! Interrupted: 1 errors during collection 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6770) [CI][Travis] Download Minio quietly

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6770:
---
Description: To remove verbose output 
https://travis-ci.org/pitrou/arrow/jobs/592577525#L191

> [CI][Travis] Download Minio quietly
> ---
>
> Key: ARROW-6770
> URL: https://issues.apache.org/jira/browse/ARROW-6770
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To remove verbose output 
> https://travis-ci.org/pitrou/arrow/jobs/592577525#L191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6770) [CI][Travis] Download Minio quietly

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6770:
--
Labels: pull-request-available  (was: )

> [CI][Travis] Download Minio quietly
> ---
>
> Key: ARROW-6770
> URL: https://issues.apache.org/jira/browse/ARROW-6770
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-5655) [Python] Table.from_pydict/from_arrays not using types in specified schema correctly

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5655:
--
Labels: pull-request-available  (was: )

> [Python] Table.from_pydict/from_arrays not using types in specified schema 
> correctly 
> -
>
> Key: ARROW-5655
> URL: https://issues.apache.org/jira/browse/ARROW-5655
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Example with {{from_pydict}} (from 
> https://github.com/apache/arrow/pull/4601#issuecomment-503676534):
> {code:python}
> In [15]: table = pa.Table.from_pydict(
> ...: {'a': [1, 2, 3], 'b': [3, 4, 5]},
> ...: schema=pa.schema([('a', pa.int64()), ('c', pa.int32())]))
> In [16]: table
> Out[16]: 
> pyarrow.Table
> a: int64
> c: int32
> In [17]: table.to_pandas()
> Out[17]: 
>a  c
> 0  1  3
> 1  2  0
> 2  3  4
> {code}
> Note that the specified schema has 1) different column names and 2) has a 
> non-default type (int32 vs int64) which leads to corrupted values.
> This is partly due to {{Table.from_pydict}} not using the type information in 
> the schema to convert the dictionary items to pyarrow arrays. But then it is 
> also {{Table.from_arrays}} that is not correctly casting the arrays to 
> another dtype if the schema specifies as such.
> Additional question for {{Table.pydict}} is whether it actually should 
> override the 'b' key from the dictionary as column 'c' as defined in the 
> schema (this behaviour depends on the order of the dictionary, which is not 
> guaranteed below python 3.6).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-5855) [Python] Add support for Duration type

2019-10-02 Thread Joris Van den Bossche (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-5855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche reassigned ARROW-5855:


Assignee: Joris Van den Bossche

> [Python] Add support for Duration type
> --
>
> Key: ARROW-5855
> URL: https://issues.apache.org/jira/browse/ARROW-5855
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add support for the Duration type (added in C++: ARROW-835, ARROW-5261)
> - add DurationType and DurationArray wrappers
> - add inference support for datetime.timedelta / np.timedelta64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-5855) [Python] Add support for Duration type

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-5855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5855:
--
Labels: pull-request-available  (was: )

> [Python] Add support for Duration type
> --
>
> Key: ARROW-5855
> URL: https://issues.apache.org/jira/browse/ARROW-5855
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Add support for the Duration type (added in C++: ARROW-835, ARROW-5261)
> - add DurationType and DurationArray wrappers
> - add inference support for datetime.timedelta / np.timedelta64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6756) [C++][Python] Include HDFS `getfacl` in `pyarrow.hdfs.HadoopFileSystem`

2019-10-02 Thread Wes McKinney (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942897#comment-16942897
 ] 

Wes McKinney commented on ARROW-6756:
-

Is this exposed in libhdfs?

> [C++][Python] Include HDFS `getfacl` in `pyarrow.hdfs.HadoopFileSystem`
> ---
>
> Key: ARROW-6756
> URL: https://issues.apache.org/jira/browse/ARROW-6756
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: bb
>Priority: Major
>  Labels: arrow, hdfs, pyarrow
>
> Extended HDFS filesystem attributes are exposed through the `getfacl` command.
> It would be immensely help to have this information accessible via:
> {code:java}
> pyarrow.hdfs.HadoopFileSystem{code}
>  
> Link to the official Hadoop docs where this is discussed in more detail:
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#getfacl]
> Sample output from the *nix shell:
> {code:java}
> $ hadoop fs -getfacl /path/to/hdfs/dir
>  # file: /path/to/hdfs/dir
>  # owner: hive
>  # group: hive
>  user::rwx
>  group:unix_group_with_acl_privs_defined:rwx
>  group::---
>  user:hive:rwx
>  group:hive:rwx
>  mask::rwx
>  other::--x{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6756) [C++][Python] Include HDFS `getfacl` in `pyarrow.hdfs.HadoopFileSystem`

2019-10-02 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6756:

Summary: [C++][Python] Include HDFS `getfacl` in 
`pyarrow.hdfs.HadoopFileSystem`  (was: Include HDFS `getfacl` in 
`pyarrow.hdfs.HadoopFileSystem`)

> [C++][Python] Include HDFS `getfacl` in `pyarrow.hdfs.HadoopFileSystem`
> ---
>
> Key: ARROW-6756
> URL: https://issues.apache.org/jira/browse/ARROW-6756
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: bb
>Priority: Major
>  Labels: arrow, hdfs, pyarrow
>
> Extended HDFS filesystem attributes are exposed through the `getfacl` command.
> It would be immensely help to have this information accessible via:
> {code:java}
> pyarrow.hdfs.HadoopFileSystem{code}
>  
> Link to the official Hadoop docs where this is discussed in more detail:
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#getfacl]
> Sample output from the *nix shell:
> {code:java}
> $ hadoop fs -getfacl /path/to/hdfs/dir
>  # file: /path/to/hdfs/dir
>  # owner: hive
>  # group: hive
>  user::rwx
>  group:unix_group_with_acl_privs_defined:rwx
>  group::---
>  user:hive:rwx
>  group:hive:rwx
>  mask::rwx
>  other::--x{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6770) [CI][Travis] Download Minio quietly

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-6770:
--

Assignee: Krisztian Szucs

> [CI][Travis] Download Minio quietly
> ---
>
> Key: ARROW-6770
> URL: https://issues.apache.org/jira/browse/ARROW-6770
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6770) [CI][Travis] Download Minio quietly

2019-10-02 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-6770:
--

 Summary: [CI][Travis] Download Minio quietly
 Key: ARROW-6770
 URL: https://issues.apache.org/jira/browse/ARROW-6770
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Krisztian Szucs






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-5802) [CI] Dockerize "lint" Travis CI job

2019-10-02 Thread Francois Saint-Jacques (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-5802:
-

Assignee: Francois Saint-Jacques

> [CI] Dockerize "lint" Travis CI job
> ---
>
> Key: ARROW-5802
> URL: https://issues.apache.org/jira/browse/ARROW-5802
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Assignee: Francois Saint-Jacques
>Priority: Major
> Fix For: 1.0.0
>
>
> Run via docker-compose; also enables contributors to lint locally



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6768) [C++][Dataset] Implement dataset::Scan to Table helper function

2019-10-02 Thread Francois Saint-Jacques (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-6768:
-

Assignee: Francois Saint-Jacques

> [C++][Dataset] Implement dataset::Scan to Table helper function
> ---
>
> Key: ARROW-6768
> URL: https://issues.apache.org/jira/browse/ARROW-6768
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset
>
> The Scan interface exposes classes (ScanTask/Iterator) which are not of 
> interest to all callers. This would implement `Status 
> Scan::Materialize(std::shared_ptr* out)` so consumers can call 
> this function instead of consuming and dispatching the streaming interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6769) [C++][Dataset] End to End dataset integration test case

2019-10-02 Thread Francois Saint-Jacques (Jira)

Francois Saint-Jacques created ARROW-6769:
-

 Summary: [C++][Dataset] End to End dataset integration test case
 Key: ARROW-6769
 URL: https://issues.apache.org/jira/browse/ARROW-6769
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


1. Create a DataSource from a known directory and a PartitionScheme. 
2. Create a Dataset from the previous DataSource. 
3. Request a ScannerBuilder from previous Dataset. 
4. Add filter expression to ScannerBuilder (and other options). 
5. Finalize into a Scan operation. 
6. Materialize into an arrow::Table.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6769) [C++][Dataset] End to End dataset integration test case

2019-10-02 Thread Francois Saint-Jacques (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-6769:
-

Assignee: Francois Saint-Jacques

> [C++][Dataset] End to End dataset integration test case
> ---
>
> Key: ARROW-6769
> URL: https://issues.apache.org/jira/browse/ARROW-6769
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset
>
> 1. Create a DataSource from a known directory and a PartitionScheme. 
> 2. Create a Dataset from the previous DataSource. 
> 3. Request a ScannerBuilder from previous Dataset. 
> 4. Add filter expression to ScannerBuilder (and other options). 
> 5. Finalize into a Scan operation. 
> 6. Materialize into an arrow::Table.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6767) [JS] lazily bind batches in scan/scanReverse

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6767:
--
Labels: pull-request-available  (was: )

> [JS] lazily bind batches in scan/scanReverse
> 
>
> Key: ARROW-6767
> URL: https://issues.apache.org/jira/browse/ARROW-6767
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Taylor Baldwin
>Priority: Minor
>  Labels: pull-request-available
>
> Call {{bind(batch)}} lazily in {{scan}} and {{scanReverse}}, that is, only 
> when the predicate has matched a record in a batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6768) [C++][Dataset] Implement dataset::Scan to Table helper function

2019-10-02 Thread Francois Saint-Jacques (Jira)

Francois Saint-Jacques created ARROW-6768:
-

 Summary: [C++][Dataset] Implement dataset::Scan to Table helper 
function
 Key: ARROW-6768
 URL: https://issues.apache.org/jira/browse/ARROW-6768
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


The Scan interface exposes classes (ScanTask/Iterator) which are not of 
interest to all callers. This would implement `Status 
Scan::Materialize(std::shared_ptr* out)` so consumers can call 
this function instead of consuming and dispatching the streaming interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6767) [JS] lazily bind batches in scan/scanReverse

2019-10-02 Thread Taylor Baldwin (Jira)

Taylor Baldwin created ARROW-6767:
-

 Summary: [JS] lazily bind batches in scan/scanReverse
 Key: ARROW-6767
 URL: https://issues.apache.org/jira/browse/ARROW-6767
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Taylor Baldwin


Call {{bind(batch)}} lazily in {{scan}} and {{scanReverse}}, that is, only when 
the predicate has matched a record in a batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2019-10-02 Thread Tarek Allam (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarek Allam updated ARROW-6766:
---
Description: 
{{After following the instructions found on the developer guides for Python, I 
was}}
 {{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
 {{# $ git clone g...@github.com:apache/arrow.git}}
 # $ conda create -y -n pyarrow-dev -c conda-forge 
 #   --file arrow/ci/conda_env_unix.yml 
 #   --file arrow/ci/conda_env_cpp.yml 
 #   --file arrow/ci/conda_env_python.yml 
 #    compilers 
 {{#  python=3.7}}
 {{# $ conda activate pyarrow-dev}}
 {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
 {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
 {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build \}}
    pushd arrow/cpp/build \
     cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
     -DCMAKE_INSTALL_LIBDIR=lib \
     -DARROW_FLIGHT=OFF \
     -DARROW_GANDIVA=OFF \
     -DARROW_ORC=ON \
     -DARROW_PARQUET=ON \
     -DARROW_PYTHON=ON \
     -DARROW_PLASMA=ON \
     -DARROW_BUILD_TESTS=ON \
    ..
 {{make -j4}}
 {{make install}}
 {{popd}}

But when I run:

{{pushd arrow/python}}
 {{export PYARROW_WITH_FLIGHT=1}}
 {{export PYARROW_WITH_GANDIVA=1}}
 {{export PYARROW_WITH_ORC=1}}
 {{export PYARROW_WITH_PARQUET=1}}
 {{python setup.py build_ext --inplace}}
 {{popd}}

I get the following errors:

{{-- Build output directory: 
/Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
 {{-- Found the Arrow core library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
 {{-- Found the Arrow Python library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
 {{CMake Error: File /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib 
does not exist.}}{{...}}{{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
 {{CMake Error at CMakeLists.txt:230 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
 {{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
 {{CMake Error at CMakeLists.txt:226 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
 {{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
 {{CMake Error at CMakeLists.txt:230 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}

 

What is quite strange is that the libraries seem to indeed be there but they
 have an addition component such as `libarrow.15.dylib` .e.g:

{{$ ls -l libarrow_python.15.dylib && echo $PWD}}
 {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
 {{libarrow_python.15.0.0.dylib}}
 {{/Users/tallamjr/github/arrow/dist/lib}}

I guess I am not exactly sure what the issue here is but it appears to be that
 the version is not captured as a variable that is used by CMAKE? I have run the
 same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
 which both seem to produce same errors.

Apologies if this is not quite the format for JIRA issues here or perhaps if
 it's not the correct platform for this, I'm very new to the project and
 contributing to apache in general. Thanks

 

  was:
{{After following the instructions found on the developer guides for Python, I 
was}}
 {{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
 {{# $ git clone g...@github.com:apache/arrow.git}}
 # $ conda create -y -n pyarrow-dev -c conda-forge 
 #   --file arrow/ci/conda_env_unix.yml 
 #   --file arrow/ci/conda_env_cpp.yml 
 #   --file arrow/ci/conda_env_python.yml 
 #    compilers 
 {{#  python=3.7}}
 {{# $ conda activate pyarrow-dev}}
 {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
 {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
 {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build}}
 {{pushd arrow/cpp/build \}}
    cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
    -DCMAKE_INSTALL_LIBDIR=lib \
    -DARROW_FLIGHT=OFF \
    -DARROW_GANDIVA=OFF \
    -DARROW_ORC=ON \
    -DARROW_PARQUET=ON \
    -DARROW_PYTHON=ON \
    -DARROW_PLASMA=ON \
    -DARROW_BUILD_TESTS=ON \
   ..
 {{make -j4}}
 {{make install}}
 {{popd}}

But when I run:

{{pushd arrow/python}}
 {{export PYARROW_WITH_FLIGHT=1}}
 {{export PYARROW_WITH_GANDIVA=1}}
 {{export PYARROW_WITH_ORC=1}}
 {{export PYARROW_WITH_PARQUET=1}}
 {{python setup.py build_ext --inplace}}
 {{popd}}

I get the following errors:

{{-- Build output directory:

[jira] [Updated] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2019-10-02 Thread Tarek Allam (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarek Allam updated ARROW-6766:
---
Description: 
{{After following the instructions found on the developer guides for Python, I 
was}}
 {{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
 {{# $ git clone g...@github.com:apache/arrow.git}}
 # $ conda create -y -n pyarrow-dev -c conda-forge 
 #   --file arrow/ci/conda_env_unix.yml 
 #   --file arrow/ci/conda_env_cpp.yml 
 #   --file arrow/ci/conda_env_python.yml 
 #    compilers 
 {{#  python=3.7}}
 {{# $ conda activate pyarrow-dev}}
 {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
 {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
 {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build}}
 {{pushd arrow/cpp/build \}}
    cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
    -DCMAKE_INSTALL_LIBDIR=lib \
    -DARROW_FLIGHT=OFF \
    -DARROW_GANDIVA=OFF \
    -DARROW_ORC=ON \
    -DARROW_PARQUET=ON \
    -DARROW_PYTHON=ON \
    -DARROW_PLASMA=ON \
    -DARROW_BUILD_TESTS=ON \
   ..
 {{make -j4}}
 {{make install}}
 {{popd}}

But when I run:

{{pushd arrow/python}}
 {{export PYARROW_WITH_FLIGHT=1}}
 {{export PYARROW_WITH_GANDIVA=1}}
 {{export PYARROW_WITH_ORC=1}}
 {{export PYARROW_WITH_PARQUET=1}}
 {{python setup.py build_ext --inplace}}
 {{popd}}

I get the following errors:

{{-- Build output directory: 
/Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
 {{-- Found the Arrow core library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
 {{-- Found the Arrow Python library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
 {{CMake Error: File /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib 
does not exist.}}{{...}}{{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
 {{CMake Error at CMakeLists.txt:230 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
 {{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
 {{CMake Error at CMakeLists.txt:226 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
 {{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
 {{CMake Error at CMakeLists.txt:230 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}

 

What is quite strange is that the libraries seem to indeed be there but they
 have an addition component such as `libarrow.15.dylib` .e.g:

{{$ ls -l libarrow_python.15.dylib && echo $PWD}}
 {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
 {{libarrow_python.15.0.0.dylib}}
 {{/Users/tallamjr/github/arrow/dist/lib}}

I guess I am not exactly sure what the issue here is but it appears to be that
 the version is not captured as a variable that is used by CMAKE? I have run the
 same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
 which both seem to produce same errors.

Apologies if this is not quite the format for JIRA issues here or perhaps if
 it's not the correct platform for this, I'm very new to the project and
 contributing to apache in general. Thanks

 

  was:
{{After following the instructions found on the developer guides for Python, I 
was}}
 {{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
 {{# $ git clone g...@github.com:apache/arrow.git}}
# $ conda create -y -n pyarrow-dev -c conda-forge 
#   --file arrow/ci/conda_env_unix.yml 
#   --file arrow/ci/conda_env_cpp.yml 
#   --file arrow/ci/conda_env_python.yml 
#    compilers 
 {{#  python=3.7}}
 {{# $ conda activate pyarrow-dev}}
 {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
 {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
 {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build}}
 {{pushd arrow/cpp/build}}{\{cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME }}
 \{{ -DCMAKE_INSTALL_LIBDIR=lib }}
 \{{ -DARROW_FLIGHT=OFF }}
 \{{ -DARROW_GANDIVA=OFF }}
 \{{ -DARROW_ORC=ON }}
 \{{ -DARROW_PARQUET=ON }}
 \{{ -DARROW_PYTHON=ON }}
 \{{ -DARROW_PLASMA=ON }}
 \{{ -DARROW_BUILD_TESTS=ON }}
 \{{ ..}}
 {{make -j4}}
 {{make install}}
 {{popd}}

But when I run:

{{pushd arrow/python}}
 {{export PYARROW_WITH_FLIGHT=1}}
 {{export PYARROW_WITH_GANDIVA=1}}
 {{export PYARROW_WITH_ORC=1}}
 {{export PYARROW_WITH_PARQUET=1}}
 {{python setup.py build_ext --inplace}}
 {{popd}}

I get the following errors:

{{-- Build output directory:

[jira] [Commented] (ARROW-6765) 0.14.1 not available on Windows

2019-10-02 Thread Yannik (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942852#comment-16942852
 ] 

Yannik commented on ARROW-6765:
---

thanks for pointing out!

> 0.14.1 not available on Windows
> ---
>
> Key: ARROW-6765
> URL: https://issues.apache.org/jira/browse/ARROW-6765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.1
> Environment: Windows
>Reporter: Yannik
>Priority: Major
>
> On linux, I can install pyarrow 0.14.1 from pip, but on windows the latest 
> seems to be 0.14.0. Why is that?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2019-10-02 Thread Tarek Allam (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarek Allam updated ARROW-6766:
---
Description: 
{{After following the instructions found on the developer guides for Python, I 
was}}
 {{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
 {{# $ git clone g...@github.com:apache/arrow.git}}
# $ conda create -y -n pyarrow-dev -c conda-forge 
#   --file arrow/ci/conda_env_unix.yml 
#   --file arrow/ci/conda_env_cpp.yml 
#   --file arrow/ci/conda_env_python.yml 
#    compilers 
 {{#  python=3.7}}
 {{# $ conda activate pyarrow-dev}}
 {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
 {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
 {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build}}
 {{pushd arrow/cpp/build}}{\{cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME }}
 \{{ -DCMAKE_INSTALL_LIBDIR=lib }}
 \{{ -DARROW_FLIGHT=OFF }}
 \{{ -DARROW_GANDIVA=OFF }}
 \{{ -DARROW_ORC=ON }}
 \{{ -DARROW_PARQUET=ON }}
 \{{ -DARROW_PYTHON=ON }}
 \{{ -DARROW_PLASMA=ON }}
 \{{ -DARROW_BUILD_TESTS=ON }}
 \{{ ..}}
 {{make -j4}}
 {{make install}}
 {{popd}}

But when I run:

{{pushd arrow/python}}
 {{export PYARROW_WITH_FLIGHT=1}}
 {{export PYARROW_WITH_GANDIVA=1}}
 {{export PYARROW_WITH_ORC=1}}
 {{export PYARROW_WITH_PARQUET=1}}
 {{python setup.py build_ext --inplace}}
 {{popd}}

I get the following errors:

{{-- Build output directory: 
/Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
 {{-- Found the Arrow core library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
 {{-- Found the Arrow Python library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
 {{CMake Error: File /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib 
does not exist.}}{{...}}{{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
 {{CMake Error at CMakeLists.txt:230 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
 {{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
 {{CMake Error at CMakeLists.txt:226 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
 {{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
 {{CMake Error at CMakeLists.txt:230 (configure_file):}}
 \{{ configure_file Problem configuring file}}
 {{Call Stack (most recent call first):}}
 \{{ CMakeLists.txt:320 (bundle_arrow_lib)}}

 

What is quite strange is that the libraries seem to indeed be there but they
 have an addition component such as `libarrow.15.dylib` .e.g:

{{$ ls -l libarrow_python.15.dylib && echo $PWD}}
 {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
 {{libarrow_python.15.0.0.dylib}}
 {{/Users/tallamjr/github/arrow/dist/lib}}

I guess I am not exactly sure what the issue here is but it appears to be that
 the version is not captured as a variable that is used by CMAKE? I have run the
 same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
 which both seem to produce same errors.

Apologies if this is not quite the format for JIRA issues here or perhaps if
 it's not the correct platform for this, I'm very new to the project and
 contributing to apache in general. Thanks

 

  was:
{{After following the instructions found on the developer guides for Python, I 
was}}
{{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
{{# $ git clone g...@github.com:apache/arrow.git}}
{{# $ conda create -y -n pyarrow-dev -c conda-forge \}}
{{# --file arrow/ci/conda_env_unix.yml \}}
{{# --file arrow/ci/conda_env_cpp.yml \}}
{{# --file arrow/ci/conda_env_python.yml \}}
{{# compilers \}}
{{# python=3.7}}
{{# $ conda activate pyarrow-dev}}
{{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
{{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
{{export CXX=`which clang++`}}{{mkdir arrow/cpp/build}}
{{pushd arrow/cpp/build}}{{cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \}}
{{ -DCMAKE_INSTALL_LIBDIR=lib \}}
{{ -DARROW_FLIGHT=OFF \}}
{{ -DARROW_GANDIVA=OFF \}}
{{ -DARROW_ORC=ON \}}
{{ -DARROW_PARQUET=ON \}}
{{ -DARROW_PYTHON=ON \}}
{{ -DARROW_PLASMA=ON \}}
{{ -DARROW_BUILD_TESTS=ON \}}
{{ ..}}
{{make -j4}}
{{make install}}
{{popd}}

But when I run:


{{pushd arrow/python}}
{{export PYARROW_WITH_FLIGHT=1}}
{{export PYARROW_WITH_GANDIVA=1}}
{{export PYARROW_WITH_ORC=1}}
{{export PYARROW_WITH_PARQUET=1}}
{{python setup.py build_ext --inplace}}
{{popd}}

I get the following errors:


{{-- Build output

[jira] [Updated] (ARROW-6766) [Python] libarrow_python..dylib does not exist

2019-10-02 Thread Tarek Allam (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarek Allam updated ARROW-6766:
---
Summary: [Python] libarrow_python..dylib does not exist  (was: 
libarrow_python..dylib does not exist)

> [Python] libarrow_python..dylib does not exist
> --
>
> Key: ARROW-6766
> URL: https://issues.apache.org/jira/browse/ARROW-6766
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Tarek Allam
>Priority: Blocker
>
> {{After following the instructions found on the developer guides for Python, 
> I was}}
> {{able to build fine by using:}}
> {{# Assuming immediately prior one has run:}}
> {{# $ git clone g...@github.com:apache/arrow.git}}
> {{# $ conda create -y -n pyarrow-dev -c conda-forge \}}
> {{# --file arrow/ci/conda_env_unix.yml \}}
> {{# --file arrow/ci/conda_env_cpp.yml \}}
> {{# --file arrow/ci/conda_env_python.yml \}}
> {{# compilers \}}
> {{# python=3.7}}
> {{# $ conda activate pyarrow-dev}}
> {{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
> ARROW_HOME=$(pwd)/arrow/dist}}
> {{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
> CC=`which clang`}}
> {{export CXX=`which clang++`}}{{mkdir arrow/cpp/build}}
> {{pushd arrow/cpp/build}}{{cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \}}
> {{ -DCMAKE_INSTALL_LIBDIR=lib \}}
> {{ -DARROW_FLIGHT=OFF \}}
> {{ -DARROW_GANDIVA=OFF \}}
> {{ -DARROW_ORC=ON \}}
> {{ -DARROW_PARQUET=ON \}}
> {{ -DARROW_PYTHON=ON \}}
> {{ -DARROW_PLASMA=ON \}}
> {{ -DARROW_BUILD_TESTS=ON \}}
> {{ ..}}
> {{make -j4}}
> {{make install}}
> {{popd}}
> But when I run:
> {{pushd arrow/python}}
> {{export PYARROW_WITH_FLIGHT=1}}
> {{export PYARROW_WITH_GANDIVA=1}}
> {{export PYARROW_WITH_ORC=1}}
> {{export PYARROW_WITH_PARQUET=1}}
> {{python setup.py build_ext --inplace}}
> {{popd}}
> I get the following errors:
> {{-- Build output directory: 
> /Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
> {{-- Found the Arrow core library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
> {{-- Found the Arrow Python library: 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
> {{CMake Error: File /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib 
> does not exist.}}{{...}}{{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
> {{CMake Error at CMakeLists.txt:230 (configure_file):}}
> {{ configure_file Problem configuring file}}
> {{Call Stack (most recent call first):}}
> {{ CMakeLists.txt:315 (bundle_arrow_lib)}}
> {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
> {{CMake Error at CMakeLists.txt:226 (configure_file):}}
> {{ configure_file Problem configuring file}}
> {{Call Stack (most recent call first):}}
> {{ CMakeLists.txt:320 (bundle_arrow_lib)}}
> {{CMake Error: File 
> /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
> exist.}}
> {{CMake Error at CMakeLists.txt:230 (configure_file):}}
> {{ configure_file Problem configuring file}}
> {{Call Stack (most recent call first):}}
> {{ CMakeLists.txt:320 (bundle_arrow_lib)}}
>  
> What is quite strange is that the libraries seem to indeed be there but they
> have an addition component such as `libarrow.15.dylib` .e.g:
> {{$ ls -l libarrow_python.15.dylib && echo $PWD}}
> {{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
> {{libarrow_python.15.0.0.dylib}}
> {{/Users/tallamjr/github/arrow/dist/lib}}
> I guess I am not exactly sure what the issue here is but it appears to be that
> the version is not captured as a variable that is used by CMAKE? I have run 
> the
> same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
> which both seem to produce same errors.
> Apologies if this is not quite the format for JIRA issues here or perhaps if
> it's not the correct platform for this, I'm very new to the project and
> contributing to apache in general. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6766) libarrow_python..dylib does not exist

2019-10-02 Thread Tarek Allam (Jira)

Tarek Allam created ARROW-6766:
--

 Summary: libarrow_python..dylib does not exist
 Key: ARROW-6766
 URL: https://issues.apache.org/jira/browse/ARROW-6766
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.14.0, 0.15.0
Reporter: Tarek Allam


{{After following the instructions found on the developer guides for Python, I 
was}}
{{able to build fine by using:}}

{{# Assuming immediately prior one has run:}}
{{# $ git clone g...@github.com:apache/arrow.git}}
{{# $ conda create -y -n pyarrow-dev -c conda-forge \}}
{{# --file arrow/ci/conda_env_unix.yml \}}
{{# --file arrow/ci/conda_env_cpp.yml \}}
{{# --file arrow/ci/conda_env_python.yml \}}
{{# compilers \}}
{{# python=3.7}}
{{# $ conda activate pyarrow-dev}}
{{# $ brew update && brew bundle --file=arrow/cpp/Brewfile}}{{export 
ARROW_HOME=$(pwd)/arrow/dist}}
{{export LD_LIBRARY_PATH=$(pwd)/arrow/dist/lib:$LD_LIBRARY_PATH}}{{export 
CC=`which clang`}}
{{export CXX=`which clang++`}}{{mkdir arrow/cpp/build}}
{{pushd arrow/cpp/build}}{{cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \}}
{{ -DCMAKE_INSTALL_LIBDIR=lib \}}
{{ -DARROW_FLIGHT=OFF \}}
{{ -DARROW_GANDIVA=OFF \}}
{{ -DARROW_ORC=ON \}}
{{ -DARROW_PARQUET=ON \}}
{{ -DARROW_PYTHON=ON \}}
{{ -DARROW_PLASMA=ON \}}
{{ -DARROW_BUILD_TESTS=ON \}}
{{ ..}}
{{make -j4}}
{{make install}}
{{popd}}

But when I run:


{{pushd arrow/python}}
{{export PYARROW_WITH_FLIGHT=1}}
{{export PYARROW_WITH_GANDIVA=1}}
{{export PYARROW_WITH_ORC=1}}
{{export PYARROW_WITH_PARQUET=1}}
{{python setup.py build_ext --inplace}}
{{popd}}

I get the following errors:


{{-- Build output directory: 
/Users/tallamjr/Github/arrow/python/build/temp.macosx-10.9-x86_64-3.7/release}}
{{-- Found the Arrow core library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow.dylib}}
{{-- Found the Arrow Python library: 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python.dylib}}
{{CMake Error: File /usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib 
does not exist.}}{{...}}{{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow..dylib does not exist.}}
{{CMake Error at CMakeLists.txt:230 (configure_file):}}
{{ configure_file Problem configuring file}}
{{Call Stack (most recent call first):}}
{{ CMakeLists.txt:315 (bundle_arrow_lib)}}
{{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
{{CMake Error at CMakeLists.txt:226 (configure_file):}}
{{ configure_file Problem configuring file}}
{{Call Stack (most recent call first):}}
{{ CMakeLists.txt:320 (bundle_arrow_lib)}}
{{CMake Error: File 
/usr/local/anaconda3/envs/pyarrow-dev/lib/libarrow_python..dylib does not 
exist.}}
{{CMake Error at CMakeLists.txt:230 (configure_file):}}
{{ configure_file Problem configuring file}}
{{Call Stack (most recent call first):}}
{{ CMakeLists.txt:320 (bundle_arrow_lib)}}

 

What is quite strange is that the libraries seem to indeed be there but they
have an addition component such as `libarrow.15.dylib` .e.g:


{{$ ls -l libarrow_python.15.dylib && echo $PWD}}
{{lrwxr-xr-x 1 tallamjr staff 28 Oct 2 14:02 libarrow_python.15.dylib ->}}
{{libarrow_python.15.0.0.dylib}}
{{/Users/tallamjr/github/arrow/dist/lib}}

I guess I am not exactly sure what the issue here is but it appears to be that
the version is not captured as a variable that is used by CMAKE? I have run the
same setup on `master` (`7d18c1c`) and on `apache-arrow-0.14.0` (`a591d76`)
which both seem to produce same errors.

Apologies if this is not quite the format for JIRA issues here or perhaps if
it's not the correct platform for this, I'm very new to the project and
contributing to apache in general. Thanks

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6755) [Release] Improvements to Windows release verification script

2019-10-02 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6755.
---
Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5559
[https://github.com/apache/arrow/pull/5559]

> [Release] Improvements to Windows release verification script
> -
>
> Key: ARROW-6755
> URL: https://issues.apache.org/jira/browse/ARROW-6755
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Only build dynamic libraries (we don't need the static libs to verify, and 
> I got "compiler is out of heap space" errors when I built locally just now, 
> will have to investigate that some more later)
> * Maybe some other things



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6755) [Release] Improvements to Windows release verification script

2019-10-02 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-6755:
-

Assignee: Wes McKinney

> [Release] Improvements to Windows release verification script
> -
>
> Key: ARROW-6755
> URL: https://issues.apache.org/jira/browse/ARROW-6755
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> * Only build dynamic libraries (we don't need the static libs to verify, and 
> I got "compiler is out of heap space" errors when I built locally just now, 
> will have to investigate that some more later)
> * Maybe some other things



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-6765) 0.14.1 not available on Windows

2019-10-02 Thread Wes McKinney (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6765.
---
Resolution: Won't Fix

> 0.14.1 not available on Windows
> ---
>
> Key: ARROW-6765
> URL: https://issues.apache.org/jira/browse/ARROW-6765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.1
> Environment: Windows
>Reporter: Yannik
>Priority: Major
>
> On linux, I can install pyarrow 0.14.1 from pip, but on windows the latest 
> seems to be 0.14.0. Why is that?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-2863) [Python] Add context manager APIs to RecordBatch*Writer/Reader classes

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-2863:
--

Assignee: Krisztian Szucs

> [Python] Add context manager APIs to RecordBatch*Writer/Reader classes
> --
>
> Key: ARROW-2863
> URL: https://issues.apache.org/jira/browse/ARROW-2863
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This would cause the {{close}} method to be called when the scope exits



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-5655) [Python] Table.from_pydict/from_arrays not using types in specified schema correctly

2019-10-02 Thread Joris Van den Bossche (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942841#comment-16942841
 ] 

Joris Van den Bossche commented on ARROW-5655:
--

[~kszucs] I think this might already be fixed in the mean-time. Wes and I did 
some work related to schema handling the last month

> [Python] Table.from_pydict/from_arrays not using types in specified schema 
> correctly 
> -
>
> Key: ARROW-5655
> URL: https://issues.apache.org/jira/browse/ARROW-5655
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> Example with {{from_pydict}} (from 
> https://github.com/apache/arrow/pull/4601#issuecomment-503676534):
> {code:python}
> In [15]: table = pa.Table.from_pydict(
> ...: {'a': [1, 2, 3], 'b': [3, 4, 5]},
> ...: schema=pa.schema([('a', pa.int64()), ('c', pa.int32())]))
> In [16]: table
> Out[16]: 
> pyarrow.Table
> a: int64
> c: int32
> In [17]: table.to_pandas()
> Out[17]: 
>a  c
> 0  1  3
> 1  2  0
> 2  3  4
> {code}
> Note that the specified schema has 1) different column names and 2) has a 
> non-default type (int32 vs int64) which leads to corrupted values.
> This is partly due to {{Table.from_pydict}} not using the type information in 
> the schema to convert the dictionary items to pyarrow arrays. But then it is 
> also {{Table.from_arrays}} that is not correctly casting the arrays to 
> another dtype if the schema specifies as such.
> Additional question for {{Table.pydict}} is whether it actually should 
> override the 'b' key from the dictionary as column 'c' as defined in the 
> schema (this behaviour depends on the order of the dictionary, which is not 
> guaranteed below python 3.6).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6765) 0.14.1 not available on Windows

2019-10-02 Thread Krisztian Szucs (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942839#comment-16942839
 ] 

Krisztian Szucs commented on ARROW-6765:


The shipped 0.14.1 wheels were broken for windows, so we've decided to remove 
them.
The linking issue affecting the 0.14.1 wheels should be fixed now by 
https://issues.apache.org/jira/browse/ARROW-6584

The current 0.15 release is under vote, you can try to download and install the 
release candidate wheel from:
https://bintray.com/apache/arrow/python-rc/0.15.0-rc2#files/python-rc/0.15.0-rc2

> 0.14.1 not available on Windows
> ---
>
> Key: ARROW-6765
> URL: https://issues.apache.org/jira/browse/ARROW-6765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.1
> Environment: Windows
>Reporter: Yannik
>Priority: Major
>
> On linux, I can install pyarrow 0.14.1 from pip, but on windows the latest 
> seems to be 0.14.0. Why is that?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6757) [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access violation" with Visual Studio 2017

2019-10-02 Thread Wes McKinney (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942835#comment-16942835
 ] 

Wes McKinney commented on ARROW-6757:
-

I haven't had a chance to investigate further yet. Will report back once I 
learn something

> [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access 
> violation" with Visual Studio 2017
> ---
>
> Key: ARROW-6757
> URL: https://issues.apache.org/jira/browse/ARROW-6757
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> I encountered this when trying to verify the release with MSVC 2017. It may 
> be particular to this machine or build (though it's 100% reproducible for 
> me). I will check the Windows wheels to see if it occurs there, too
> {code}
> (C:\tmp\arrow-verify-release\conda-env) λ python
> Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 22:01:29) 
> [MSC v.1900 64 bit (AMD64)] :: Anaconda, Inc. on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow.csv as pc
> >>> pc.ParseOptions()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-5655) [Python] Table.from_pydict/from_arrays not using types in specified schema correctly

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-5655:
--

Assignee: Krisztian Szucs

> [Python] Table.from_pydict/from_arrays not using types in specified schema 
> correctly 
> -
>
> Key: ARROW-5655
> URL: https://issues.apache.org/jira/browse/ARROW-5655
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> Example with {{from_pydict}} (from 
> https://github.com/apache/arrow/pull/4601#issuecomment-503676534):
> {code:python}
> In [15]: table = pa.Table.from_pydict(
> ...: {'a': [1, 2, 3], 'b': [3, 4, 5]},
> ...: schema=pa.schema([('a', pa.int64()), ('c', pa.int32())]))
> In [16]: table
> Out[16]: 
> pyarrow.Table
> a: int64
> c: int32
> In [17]: table.to_pandas()
> Out[17]: 
>a  c
> 0  1  3
> 1  2  0
> 2  3  4
> {code}
> Note that the specified schema has 1) different column names and 2) has a 
> non-default type (int32 vs int64) which leads to corrupted values.
> This is partly due to {{Table.from_pydict}} not using the type information in 
> the schema to convert the dictionary items to pyarrow arrays. But then it is 
> also {{Table.from_arrays}} that is not correctly casting the arrays to 
> another dtype if the schema specifies as such.
> Additional question for {{Table.pydict}} is whether it actually should 
> override the 'b' key from the dictionary as column 'c' as defined in the 
> schema (this behaviour depends on the order of the dictionary, which is not 
> guaranteed below python 3.6).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-02 Thread Antoine Pitrou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942811#comment-16942811
 ] 

Antoine Pitrou commented on ARROW-6762:
---

Well, the attached PR removes the limitation then.

> [C++] JSON reader segfaults on newline
> --
>
> Key: ARROW-6762
> URL: https://issues.apache.org/jira/browse/ARROW-6762
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: json, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that 
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json 
>...: import pyarrow.parquet as pq 
>...:  
>...: r = json.read_json('SampleRecord.jl') 
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
> (string_view::npos) 
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json 
> ...: import pyarrow.parquet as pq 
> ...:  
> ...: r = json.read_json('SampleRecord.jl')
>   
>
> In [25]: r
>   
>
> Out[25]: 
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list volume_unit: string, buyer: null, seller: null>>
>   child 0, item: struct volume_unit: string, buyer: null, seller: null>
>   child 0, movement: string
>   child 1, product: string
>   child 2, volume: string
>   child 3, volume_unit: string
>   child 4, buyer: null
>   child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
> string, length: int64, mmsi: null, name: string, type: null, vessel_type: 
> null>
>   child 0, beam: null
>   child 1, build_year: null
>   child 2, call_sign: null
>   child 3, dead_weight: null
>   child 4, dwt: null
>   child 5, flag_code: null
>   child 6, flag_name: null
>   child 7, gross_tonnage: null
>   child 8, imo: string
>   child 9, length: int64
>   child 10, mmsi: null
>   child 11, name: string
>   child 12, type: null
>   child 13, vessel_type: null
> In [26]: pa.__version__   
>   
>
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-02 Thread Ben Kietzman (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942801#comment-16942801
 ] 

Ben Kietzman edited comment on ARROW-6762 at 10/2/19 1:19 PM:
--

This is a [stated 
limitation|https://github.com/apache/arrow/blob/master/cpp/src/arrow/json/options.h#L50]
 of the JSON parser when parsing with strict newline delimiters. Still, it 
shouldn't crash; we should probably change the debug assertion to an 
informative error message suggesting {{newlines_in_values=true}} or appending 
an empty line.


was (Author: bkietz):
This is a [stated 
limitation|https://github.com/apache/arrow/blob/master/cpp/src/arrow/json/options.h#L50]
 of the JSON parser when parsing with strict newline delimiters. Still, it 
shouldn't crash. I'll change the debug assertion to an informative error 
message suggesting {{newlines_in_values=true}} or appending an empty line.

> [C++] JSON reader segfaults on newline
> --
>
> Key: ARROW-6762
> URL: https://issues.apache.org/jira/browse/ARROW-6762
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: json, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that 
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json 
>...: import pyarrow.parquet as pq 
>...:  
>...: r = json.read_json('SampleRecord.jl') 
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
> (string_view::npos) 
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json 
> ...: import pyarrow.parquet as pq 
> ...:  
> ...: r = json.read_json('SampleRecord.jl')
>   
>
> In [25]: r
>   
>
> Out[25]: 
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list volume_unit: string, buyer: null, seller: null>>
>   child 0, item: struct volume_unit: string, buyer: null, seller: null>
>   child 0, movement: string
>   child 1, product: string
>   child 2, volume: string
>   child 3, volume_unit: string
>   child 4, buyer: null
>   child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
> string, length: int64, mmsi: null, name: string, type: null, vessel_type: 
> null>
>   child 0, beam: null
>   child 1, build_year: null
>   child 2, call_sign: null
>   child 3, dead_weight: null
>   child 4, dwt: null
>   child 5, flag_code: null
>   child 6, flag_name: null
>   child 7, gross_tonnage: null
>   child 8, imo: string
>   child 9, length: int64
>   child 10, mmsi: null
>   child 11, name: string
>   child 12, type: null
>   child 13, vessel_type: null
> In [26]: pa.__version__   
>   
>
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-02 Thread Ben Kietzman (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942801#comment-16942801
 ] 

Ben Kietzman commented on ARROW-6762:
-

This is a [stated 
limitation|https://github.com/apache/arrow/blob/master/cpp/src/arrow/json/options.h#L50]
 of the JSON parser when parsing with strict newline delimiters. Still, it 
shouldn't crash. I'll change the debug assertion to an informative error 
message suggesting {{newlines_in_values=true}} or appending an empty line.

> [C++] JSON reader segfaults on newline
> --
>
> Key: ARROW-6762
> URL: https://issues.apache.org/jira/browse/ARROW-6762
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: json, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that 
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json 
>...: import pyarrow.parquet as pq 
>...:  
>...: r = json.read_json('SampleRecord.jl') 
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
> (string_view::npos) 
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json 
> ...: import pyarrow.parquet as pq 
> ...:  
> ...: r = json.read_json('SampleRecord.jl')
>   
>
> In [25]: r
>   
>
> Out[25]: 
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list volume_unit: string, buyer: null, seller: null>>
>   child 0, item: struct volume_unit: string, buyer: null, seller: null>
>   child 0, movement: string
>   child 1, product: string
>   child 2, volume: string
>   child 3, volume_unit: string
>   child 4, buyer: null
>   child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
> string, length: int64, mmsi: null, name: string, type: null, vessel_type: 
> null>
>   child 0, beam: null
>   child 1, build_year: null
>   child 2, call_sign: null
>   child 3, dead_weight: null
>   child 4, dwt: null
>   child 5, flag_code: null
>   child 6, flag_name: null
>   child 7, gross_tonnage: null
>   child 8, imo: string
>   child 9, length: int64
>   child 10, mmsi: null
>   child 11, name: string
>   child 12, type: null
>   child 13, vessel_type: null
> In [26]: pa.__version__   
>   
>
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6760) [C++] JSON: improve error message when column changed type

2019-10-02 Thread Ben Kietzman (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman reassigned ARROW-6760:
---

Assignee: Ben Kietzman

> [C++] JSON: improve error message when column changed type
> --
>
> Key: ARROW-6760
> URL: https://issues.apache.org/jira/browse/ARROW-6760
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Assignee: Ben Kietzman
>Priority: Major
> Attachments: dummy.jl
>
>
> When a column accidentally changes type in a JSON file (which is not 
> supported), it would be nice to get the column name that gives this problem 
> in the error message.
> ---
> I am trying to parse a simple json file. While doing so, am getting the error 
>  {{JSON parse error: A column changed from string to number}}
> {code}
> from pyarrow import json
> r = json.read_json('dummy.jl')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6762:
--
Labels: json pull-request-available  (was: json)

> [C++] JSON reader segfaults on newline
> --
>
> Key: ARROW-6762
> URL: https://issues.apache.org/jira/browse/ARROW-6762
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: json, pull-request-available
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that 
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json 
>...: import pyarrow.parquet as pq 
>...:  
>...: r = json.read_json('SampleRecord.jl') 
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
> (string_view::npos) 
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json 
> ...: import pyarrow.parquet as pq 
> ...:  
> ...: r = json.read_json('SampleRecord.jl')
>   
>
> In [25]: r
>   
>
> Out[25]: 
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list volume_unit: string, buyer: null, seller: null>>
>   child 0, item: struct volume_unit: string, buyer: null, seller: null>
>   child 0, movement: string
>   child 1, product: string
>   child 2, volume: string
>   child 3, volume_unit: string
>   child 4, buyer: null
>   child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
> string, length: int64, mmsi: null, name: string, type: null, vessel_type: 
> null>
>   child 0, beam: null
>   child 1, build_year: null
>   child 2, call_sign: null
>   child 3, dead_weight: null
>   child 4, dwt: null
>   child 5, flag_code: null
>   child 6, flag_name: null
>   child 7, gross_tonnage: null
>   child 8, imo: string
>   child 9, length: int64
>   child 10, mmsi: null
>   child 11, name: string
>   child 12, type: null
>   child 13, vessel_type: null
> In [26]: pa.__version__   
>   
>
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-2863) [Python] Add context manager APIs to RecordBatch*Writer/Reader classes

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2863:
--
Labels: pull-request-available  (was: )

> [Python] Add context manager APIs to RecordBatch*Writer/Reader classes
> --
>
> Key: ARROW-2863
> URL: https://issues.apache.org/jira/browse/ARROW-2863
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> This would cause the {{close}} method to be called when the scope exits



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6765) 0.14.1 not available on Windows

2019-10-02 Thread Yannik (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yannik updated ARROW-6765:
--
Component/s: Python

> 0.14.1 not available on Windows
> ---
>
> Key: ARROW-6765
> URL: https://issues.apache.org/jira/browse/ARROW-6765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.1
> Environment: Windows
>Reporter: Yannik
>Priority: Major
>
> On linux, I can install pyarrow 0.14.1 from pip, but on windows the latest 
> seems to be 0.14.0. Why is that?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6765) 0.14.1 not available on Windows

2019-10-02 Thread Yannik (Jira)

Yannik created ARROW-6765:
-

 Summary: 0.14.1 not available on Windows
 Key: ARROW-6765
 URL: https://issues.apache.org/jira/browse/ARROW-6765
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.14.1
 Environment: Windows
Reporter: Yannik


On linux, I can install pyarrow 0.14.1 from pip, but on windows the latest 
seems to be 0.14.0. Why is that?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6213) [C++] tests fail for AVX512

2019-10-02 Thread Charles Coulombe (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942740#comment-16942740
 ] 

Charles Coulombe commented on ARROW-6213:
-

Ok, will try what you suggests. I'll keep you posted.

 

> [C++] tests fail for AVX512
> ---
>
> Key: ARROW-6213
> URL: https://issues.apache.org/jira/browse/ARROW-6213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.1
> Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) 
> avx512
>Reporter: Charles Coulombe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, 
> arrow-0.14.1-c++-failed-tests.txt, 
> easybuild-arrow-0.14.1-20190809.34.MgMEK.log
>
>
> When building libraries for avx512 with GCC 7.3.0, two C++ tests fails.
> {noformat}
> The following tests FAILED: 
>   28 - arrow-compute-compare-test (Failed) 
>   30 - arrow-compute-filter-test (Failed) 
> Errors while running CTest{noformat}
> while for avx2 they passes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6750) [Python] test_fs.py not silent

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-6750:
--

Assignee: Antoine Pitrou

> [Python] test_fs.py not silent
> --
>
> Key: ARROW-6750
> URL: https://issues.apache.org/jira/browse/ARROW-6750
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some errors get displayed at the end of {{test_fs.py}}:
> {code}
> $ python -m pytest --tb=native pyarrow/tests/test_fs.py 
> === 
> test session starts 
> 
> platform linux -- Python 3.7.3, pytest-5.1.1, py-1.8.0, pluggy-0.12.0
> hypothesis profile 'dev' -> max_examples=10, 
> database=DirectoryBasedExampleDatabase('/home/antoine/arrow/dev/python/.hypothesis/examples')
> rootdir: /home/antoine/arrow/dev/python, inifile: setup.cfg
> plugins: timeout-1.3.3, repeat-0.8.0, hypothesis-3.82.1, lazy-fixture-0.5.2, 
> forked-1.0.2, xdist-1.28.0
> collected 90 items
>   
>
> pyarrow/tests/test_fs.py 
> ..
>   [100%]
>  
> 90 passed in 1.33s 
> 
> 19-08-07T01-59-21Z
> vary : Origin
> x-amz-request-id : 15C988FDC2359A9C
> x-xss-protection : 1; mode=block
> [ERROR] 2019-10-01 13:29:28.597 AWSClient [139765563750208] HTTP response 
> code: 409
> Exception name: BucketAlreadyOwnedByYou
> Error message: Your previous request to create the named bucket succeeded and 
> you already own it.
> 9 response headers:
> accept-ranges : bytes
> content-length : 366
> content-security-policy : block-all-mixed-content
> content-type : application/xml
> date : Tue, 01 Oct 2019 13:29:28 GMT
> server : MinIO/RELEASE.2019-08-07T01-59-21Z
> vary : Origin
> x-amz-request-id : 15C988FDC317620E
> x-xss-protection : 1; mode=block
> [etc.]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6750) [Python] Silence S3 error logs by default

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-6750:
---
Summary: [Python] Silence S3 error logs by default  (was: [Python] 
test_fs.py not silent)

> [Python] Silence S3 error logs by default
> -
>
> Key: ARROW-6750
> URL: https://issues.apache.org/jira/browse/ARROW-6750
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some errors get displayed at the end of {{test_fs.py}}:
> {code}
> $ python -m pytest --tb=native pyarrow/tests/test_fs.py 
> === 
> test session starts 
> 
> platform linux -- Python 3.7.3, pytest-5.1.1, py-1.8.0, pluggy-0.12.0
> hypothesis profile 'dev' -> max_examples=10, 
> database=DirectoryBasedExampleDatabase('/home/antoine/arrow/dev/python/.hypothesis/examples')
> rootdir: /home/antoine/arrow/dev/python, inifile: setup.cfg
> plugins: timeout-1.3.3, repeat-0.8.0, hypothesis-3.82.1, lazy-fixture-0.5.2, 
> forked-1.0.2, xdist-1.28.0
> collected 90 items
>   
>
> pyarrow/tests/test_fs.py 
> ..
>   [100%]
>  
> 90 passed in 1.33s 
> 
> 19-08-07T01-59-21Z
> vary : Origin
> x-amz-request-id : 15C988FDC2359A9C
> x-xss-protection : 1; mode=block
> [ERROR] 2019-10-01 13:29:28.597 AWSClient [139765563750208] HTTP response 
> code: 409
> Exception name: BucketAlreadyOwnedByYou
> Error message: Your previous request to create the named bucket succeeded and 
> you already own it.
> 9 response headers:
> accept-ranges : bytes
> content-length : 366
> content-security-policy : block-all-mixed-content
> content-type : application/xml
> date : Tue, 01 Oct 2019 13:29:28 GMT
> server : MinIO/RELEASE.2019-08-07T01-59-21Z
> vary : Origin
> x-amz-request-id : 15C988FDC317620E
> x-xss-protection : 1; mode=block
> [etc.]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6750) [Python] test_fs.py not silent

2019-10-02 Thread Krisztian Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-6750.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5553
[https://github.com/apache/arrow/pull/5553]

> [Python] test_fs.py not silent
> --
>
> Key: ARROW-6750
> URL: https://issues.apache.org/jira/browse/ARROW-6750
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some errors get displayed at the end of {{test_fs.py}}:
> {code}
> $ python -m pytest --tb=native pyarrow/tests/test_fs.py 
> === 
> test session starts 
> 
> platform linux -- Python 3.7.3, pytest-5.1.1, py-1.8.0, pluggy-0.12.0
> hypothesis profile 'dev' -> max_examples=10, 
> database=DirectoryBasedExampleDatabase('/home/antoine/arrow/dev/python/.hypothesis/examples')
> rootdir: /home/antoine/arrow/dev/python, inifile: setup.cfg
> plugins: timeout-1.3.3, repeat-0.8.0, hypothesis-3.82.1, lazy-fixture-0.5.2, 
> forked-1.0.2, xdist-1.28.0
> collected 90 items
>   
>
> pyarrow/tests/test_fs.py 
> ..
>   [100%]
>  
> 90 passed in 1.33s 
> 
> 19-08-07T01-59-21Z
> vary : Origin
> x-amz-request-id : 15C988FDC2359A9C
> x-xss-protection : 1; mode=block
> [ERROR] 2019-10-01 13:29:28.597 AWSClient [139765563750208] HTTP response 
> code: 409
> Exception name: BucketAlreadyOwnedByYou
> Error message: Your previous request to create the named bucket succeeded and 
> you already own it.
> 9 response headers:
> accept-ranges : bytes
> content-length : 366
> content-security-policy : block-all-mixed-content
> content-type : application/xml
> date : Tue, 01 Oct 2019 13:29:28 GMT
> server : MinIO/RELEASE.2019-08-07T01-59-21Z
> vary : Origin
> x-amz-request-id : 15C988FDC317620E
> x-xss-protection : 1; mode=block
> [etc.]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6764) [C++] Simplify readahead implementation

2019-10-02 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-6764:
-

 Summary: [C++] Simplify readahead implementation
 Key: ARROW-6764
 URL: https://issues.apache.org/jira/browse/ARROW-6764
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


The current implementation is very ad-hoc and allows unused padding arguments.

We could refactor it using the Iterator facility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-02 Thread Antoine Pitrou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-6762:
-

Assignee: Antoine Pitrou

> [C++] JSON reader segfaults on newline
> --
>
> Key: ARROW-6762
> URL: https://issues.apache.org/jira/browse/ARROW-6762
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: json
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that 
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json 
>...: import pyarrow.parquet as pq 
>...:  
>...: r = json.read_json('SampleRecord.jl') 
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
> (string_view::npos) 
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json 
> ...: import pyarrow.parquet as pq 
> ...:  
> ...: r = json.read_json('SampleRecord.jl')
>   
>
> In [25]: r
>   
>
> Out[25]: 
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list volume_unit: string, buyer: null, seller: null>>
>   child 0, item: struct volume_unit: string, buyer: null, seller: null>
>   child 0, movement: string
>   child 1, product: string
>   child 2, volume: string
>   child 3, volume_unit: string
>   child 4, buyer: null
>   child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
> string, length: int64, mmsi: null, name: string, type: null, vessel_type: 
> null>
>   child 0, beam: null
>   child 1, build_year: null
>   child 2, call_sign: null
>   child 3, dead_weight: null
>   child 4, dwt: null
>   child 5, flag_code: null
>   child 6, flag_name: null
>   child 7, gross_tonnage: null
>   child 8, imo: string
>   child 9, length: int64
>   child 10, mmsi: null
>   child 11, name: string
>   child 12, type: null
>   child 13, vessel_type: null
> In [26]: pa.__version__   
>   
>
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-4226) [C++] Add CSF sparse tensor support

2019-10-02 Thread Rok Mihevc (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942607#comment-16942607
 ] 

Rok Mihevc commented on ARROW-4226:
---

I was just reading it :)
I'll start working on this sometime this week.

> [C++] Add CSF sparse tensor support
> ---
>
> Key: ARROW-4226
> URL: https://issues.apache.org/jira/browse/ARROW-4226
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172]
> {quote}Perhaps in the future, if zero-copy and future-proof-ness is really 
> what we want, we might want to add the CSF (compressed sparse fiber) format, 
> a generalisation of CSR/CSC. I'm currently working on adding it to 
> PyData/Sparse, and I plan to make it the preferred format (COO will still be 
> around though).
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-4226) [C++] Add CSF sparse tensor support

2019-10-02 Thread Kenta Murata (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942602#comment-16942602
 ] 

Kenta Murata commented on ARROW-4226:
-

[~rokm] Did you check the issue 
[pydata/sparse#125|https://github.com/pydata/sparse/issues/125]?

> [C++] Add CSF sparse tensor support
> ---
>
> Key: ARROW-4226
> URL: https://issues.apache.org/jira/browse/ARROW-4226
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172]
> {quote}Perhaps in the future, if zero-copy and future-proof-ness is really 
> what we want, we might want to add the CSF (compressed sparse fiber) format, 
> a generalisation of CSR/CSC. I'm currently working on adding it to 
> PyData/Sparse, and I plan to make it the preferred format (COO will still be 
> around though).
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6763) [Python] Parquet s3 tests are skipped because dependencies are not installed

2019-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6763:
--
Labels: pull-request-available  (was: )

> [Python] Parquet s3 tests are skipped because dependencies are not installed
> 
>
> Key: ARROW-6763
> URL: https://issues.apache.org/jira/browse/ARROW-6763
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Minor
>  Labels: pull-request-available
>
> Currently the s3 parquet test is skipped on both Travis as ursabot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-4226) [C++] Add CSF sparse tensor support

2019-10-02 Thread Rok Mihevc (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-4226:
-

Assignee: Rok Mihevc  (was: Kenta Murata)

> [C++] Add CSF sparse tensor support
> ---
>
> Key: ARROW-4226
> URL: https://issues.apache.org/jira/browse/ARROW-4226
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172]
> {quote}Perhaps in the future, if zero-copy and future-proof-ness is really 
> what we want, we might want to add the CSF (compressed sparse fiber) format, 
> a generalisation of CSR/CSC. I'm currently working on adding it to 
> PyData/Sparse, and I plan to make it the preferred format (COO will still be 
> around though).
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Rok Mihevc (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942598#comment-16942598
 ] 

Rok Mihevc commented on ARROW-4225:
---

Ok, I'll check the paper and see if I can get somewhere. :)

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-02 Thread Antoine Pitrou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942597#comment-16942597
 ] 

Antoine Pitrou commented on ARROW-6762:
---

Ok, the issue is that the JSON reader assumes the file always ends with a 
newline. Some JSON files may not have a newline at the end of the last line.

So it would not crash in release mode (it's a debug assertion), but probably 
produce the wrong result.

> [C++] JSON reader segfaults on newline
> --
>
> Key: ARROW-6762
> URL: https://issues.apache.org/jira/browse/ARROW-6762
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: json
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that 
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json 
>...: import pyarrow.parquet as pq 
>...:  
>...: r = json.read_json('SampleRecord.jl') 
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
> (string_view::npos) 
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json 
> ...: import pyarrow.parquet as pq 
> ...:  
> ...: r = json.read_json('SampleRecord.jl')
>   
>
> In [25]: r
>   
>
> Out[25]: 
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list volume_unit: string, buyer: null, seller: null>>
>   child 0, item: struct volume_unit: string, buyer: null, seller: null>
>   child 0, movement: string
>   child 1, product: string
>   child 2, volume: string
>   child 3, volume_unit: string
>   child 4, buyer: null
>   child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
> string, length: int64, mmsi: null, name: string, type: null, vessel_type: 
> null>
>   child 0, beam: null
>   child 1, build_year: null
>   child 2, call_sign: null
>   child 3, dead_weight: null
>   child 4, dwt: null
>   child 5, flag_code: null
>   child 6, flag_name: null
>   child 7, gross_tonnage: null
>   child 8, imo: string
>   child 9, length: int64
>   child 10, mmsi: null
>   child 11, name: string
>   child 12, type: null
>   child 13, vessel_type: null
> In [26]: pa.__version__   
>   
>
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6763) [Python] Parquet s3 tests are skipped because dependencies are not installed

2019-10-02 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-6763:


 Summary: [Python] Parquet s3 tests are skipped because 
dependencies are not installed
 Key: ARROW-6763
 URL: https://issues.apache.org/jira/browse/ARROW-6763
 Project: Apache Arrow
  Issue Type: Test
  Components: Python
Reporter: Joris Van den Bossche


Currently the s3 parquet test is skipped on both Travis as ursabot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6752) [Go] implement Stringer for Null array

2019-10-02 Thread Sebastien Binet (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-6752.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 
[https://github.com/apache/arrow/pull/]

> [Go] implement Stringer for Null array
> --
>
> Key: ARROW-6752
> URL: https://issues.apache.org/jira/browse/ARROW-6752
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6760) [C++] JSON: improve error message when column changed type

2019-10-02 Thread Joris Van den Bossche (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942582#comment-16942582
 ] 

Joris Van den Bossche commented on ARROW-6760:
--

Indeed, a better error message would be nice. Renamed the issue to reflect 
this. 

> [C++] JSON: improve error message when column changed type
> --
>
> Key: ARROW-6760
> URL: https://issues.apache.org/jira/browse/ARROW-6760
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: dummy.jl
>
>
> When a column accidentally changes type in a JSON file (which is not 
> supported), it would be nice to get the column name that gives this problem 
> in the error message.
> ---
> I am trying to parse a simple json file. While doing so, am getting the error 
>  {{JSON parse error: A column changed from string to number}}
> {code}
> from pyarrow import json
> r = json.read_json('dummy.jl')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6760) [C++] JSON: improve error message when column changed type

2019-10-02 Thread Joris Van den Bossche (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-6760:
-
Description: 
When a column accidentally changes type in a JSON file (which is not 
supported), it would be nice to get the column name that gives this problem in 
the error message.

---

I am trying to parse a simple json file. While doing so, am getting the error  
{{JSON parse error: A column changed from string to number}}

{code}

from pyarrow import json
r = json.read_json('dummy.jl')

{code}

 

  was:
I am trying to parse a simple json file. While doing so, am getting the error  
{{JSON parse error: A column changed from string to number}}

{code}

from pyarrow import json
r = json.read_json('dummy.jl')

{code}

 


> [C++] JSON: improve error message when column changed type
> --
>
> Key: ARROW-6760
> URL: https://issues.apache.org/jira/browse/ARROW-6760
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: dummy.jl
>
>
> When a column accidentally changes type in a JSON file (which is not 
> supported), it would be nice to get the column name that gives this problem 
> in the error message.
> ---
> I am trying to parse a simple json file. While doing so, am getting the error 
>  {{JSON parse error: A column changed from string to number}}
> {code}
> from pyarrow import json
> r = json.read_json('dummy.jl')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6760) [C++] JSON: improve error message when column changed type

2019-10-02 Thread Joris Van den Bossche (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-6760:
-
Summary: [C++] JSON: improve error message when column changed type  (was: 
JSON parse error: A column changed from string to number)

> [C++] JSON: improve error message when column changed type
> --
>
> Key: ARROW-6760
> URL: https://issues.apache.org/jira/browse/ARROW-6760
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: dummy.jl
>
>
> I am trying to parse a simple json file. While doing so, am getting the error 
>  {{JSON parse error: A column changed from string to number}}
> {code}
> from pyarrow import json
> r = json.read_json('dummy.jl')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Kenta Murata (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942578#comment-16942578
 ] 

Kenta Murata commented on ARROW-4225:
-

@rok No, I didn't. I guess there is no library that has CSF tensor 
implementation, although the paper exists.
So I'm waiting for pydata/sparse's implementation.

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6760) JSON parse error: A column changed from string to number

2019-10-02 Thread harikrishnan (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942574#comment-16942574
 ] 

harikrishnan commented on ARROW-6760:
-

Ah I see. Thanks for the quick reply [~apitrou] . Yes definitely listing the 
column name here with the error message will be a saver when it comes to 
debugging. 

> JSON parse error: A column changed from string to number
> 
>
> Key: ARROW-6760
> URL: https://issues.apache.org/jira/browse/ARROW-6760
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: dummy.jl
>
>
> I am trying to parse a simple json file. While doing so, am getting the error 
>  {{JSON parse error: A column changed from string to number}}
> {code}
> from pyarrow import json
> r = json.read_json('dummy.jl')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Rok Mihevc (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942568#comment-16942568
 ] 

Rok Mihevc commented on ARROW-4225:
---

That's great @mrkn! Did you also start on 
[CSF|https://issues.apache.org/jira/browse/ARROW-4226]? If not I'll pick it up.

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-4226) [C++] Add CSF sparse tensor support

2019-10-02 Thread Kenta Murata (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata reassigned ARROW-4226:
---

Assignee: Kenta Murata

> [C++] Add CSF sparse tensor support
> ---
>
> Key: ARROW-4226
> URL: https://issues.apache.org/jira/browse/ARROW-4226
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172]
> {quote}Perhaps in the future, if zero-copy and future-proof-ness is really 
> what we want, we might want to add the CSF (compressed sparse fiber) format, 
> a generalisation of CSR/CSC. I'm currently working on adding it to 
> PyData/Sparse, and I plan to make it the preferred format (COO will still be 
> around though).
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-02 Thread Joris Van den Bossche (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942567#comment-16942567
 ] 

Joris Van den Bossche commented on ARROW-6737:
--

I noticed that reading this file on master actually gives problems, while it 
works on 0.14.1, so opened ARROW-6762 for that.

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: SampleRecord.jl
>
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-4222) [C++] Support equality comparison between COO and CSR sparse tensors in SparseTensorEquals

2019-10-02 Thread Kenta Murata (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata reassigned ARROW-4222:
---

Assignee: Kenta Murata

> [C++] Support equality comparison between COO and CSR sparse tensors in 
> SparseTensorEquals
> --
>
> Key: ARROW-4222
> URL: https://issues.apache.org/jira/browse/ARROW-4222
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 2.0.0
>
>
> Currently SparseTensorEquals always returns false when it gets COO and CSR 
> sparse tensors.
> It should support comparing the items in the sparse tensors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Kenta Murata (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata reassigned ARROW-4225:
---

Assignee: Kenta Murata  (was: Rok Mihevc)

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-02 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-6762:


 Summary: [C++] JSON reader segfaults on newline
 Key: ARROW-6762
 URL: https://issues.apache.org/jira/browse/ARROW-6762
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Joris Van den Bossche


Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that trying 
to read this file on master results in a segfault:

{code}
In [1]: from pyarrow import json 
   ...: import pyarrow.parquet as pq 
   ...:  
   ...: r = json.read_json('SampleRecord.jl') 
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
(string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
(string_view::npos) 
*** Check failure stack trace: ***
Aborted (core dumped)
{code}

while with 0.14.1 this works fine:

{code}
In [24]: from pyarrow import json 
...: import pyarrow.parquet as pq 
...:  
...: r = json.read_json('SampleRecord.jl')  

   

In [25]: r  

   
Out[25]: 
pyarrow.Table
_type: string
provider_name: string
arrival: timestamp[s]
berthed: timestamp[s]
berth: null
cargoes: list>
  child 0, item: struct
  child 0, movement: string
  child 1, product: string
  child 2, volume: string
  child 3, volume_unit: string
  child 4, buyer: null
  child 5, seller: null
departure: timestamp[s]
eta: null
installation: null
port_name: string
next_zone: null
reported_date: timestamp[s]
shipping_agent: null
vessel: struct
  child 0, beam: null
  child 1, build_year: null
  child 2, call_sign: null
  child 3, dead_weight: null
  child 4, dwt: null
  child 5, flag_code: null
  child 6, flag_name: null
  child 7, gross_tonnage: null
  child 8, imo: string
  child 9, length: int64
  child 10, mmsi: null
  child 11, name: string
  child 12, type: null
  child 13, vessel_type: null

In [26]: pa.__version__ 

   
Out[26]: '0.14.1'
{code}

cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6760) JSON parse error: A column changed from string to number

2019-10-02 Thread Antoine Pitrou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942566#comment-16942566
 ] 

Antoine Pitrou commented on ARROW-6760:
---

Hmm, we should probably give better error messages. [~bkietz]

In this case, though, it seems the "length" field is first a string, then an 
integer. Arrow only accepts homogenous JSON, i.e. all objects in the same file 
must have the same schema.

> JSON parse error: A column changed from string to number
> 
>
> Key: ARROW-6760
> URL: https://issues.apache.org/jira/browse/ARROW-6760
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: dummy.jl
>
>
> I am trying to parse a simple json file. While doing so, am getting the error 
>  {{JSON parse error: A column changed from string to number}}
> {code}
> from pyarrow import json
> r = json.read_json('dummy.jl')
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Kenta Murata (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942565#comment-16942565
 ] 

Kenta Murata commented on ARROW-4225:
-

@rok I've already started to work on this ticket.  I'm sorry for forgetting to 
update the ticket property.

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-4225) [C++] Add CSC sparse matrix support

2019-10-02 Thread Rok Mihevc (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-4225:
-

Assignee: Rok Mihevc

> [C++] Add CSC sparse matrix support
> ---
>
> Key: ARROW-4225
> URL: https://issues.apache.org/jira/browse/ARROW-4225
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> CSC sparse matrix is necessary for integration with existing sparse matrix 
> libraries (umfpack, superlu). 
> https://github.com/apache/arrow/pull/2546#issuecomment-422135645



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6757) [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access violation" with Visual Studio 2017

2019-10-02 Thread Antoine Pitrou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942562#comment-16942562
 ] 

Antoine Pitrou commented on ARROW-6757:
---

Does the debugger tell you something?

> [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access 
> violation" with Visual Studio 2017
> ---
>
> Key: ARROW-6757
> URL: https://issues.apache.org/jira/browse/ARROW-6757
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> I encountered this when trying to verify the release with MSVC 2017. It may 
> be particular to this machine or build (though it's 100% reproducible for 
> me). I will check the Windows wheels to see if it occurs there, too
> {code}
> (C:\tmp\arrow-verify-release\conda-env) λ python
> Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 22:01:29) 
> [MSC v.1900 64 bit (AMD64)] :: Anaconda, Inc. on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow.csv as pc
> >>> pc.ParseOptions()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-6737) Nested column branch had multiple children

2019-10-02 Thread Joris Van den Bossche (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche closed ARROW-6737.

Resolution: Duplicate

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: SampleRecord.jl
>
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-02 Thread Joris Van den Bossche (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942560#comment-16942560
 ] 

Joris Van den Bossche commented on ARROW-6737:
--

Thanks for providing the sample file. This is indeed a duplicate of ARROW-1644. 
Nested lists/structs are currently not yet supported in the Arrow parquet IO 
implementation.

> Nested column branch had multiple children
> --
>
> Key: ARROW-6737
> URL: https://issues.apache.org/jira/browse/ARROW-6737
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: harikrishnan
>Priority: Major
> Attachments: SampleRecord.jl
>
>
> {code}
> from pyarrow import json
> import pyarrow.parquet as pq
> r = json.read_json('example.jl')
> pq.write_table(r, 'example.parquet')
> {code}
> Doing the above operation resulting in {{ArrowInvalid: Nested column branch 
> had multiple children}}
> Posting it here as per the request from 
> https://github.com/apache/arrow/issues/4045#issuecomment-535867640
> The sample schema looks like this
> {code}
> package_version: string
> source_version: string
> uuid: string
> _type: string
> position: struct draught_raw: null, heading: double, lat: double, lon: double, nav_state: 
> int64, received_time: timestamp[s], speed: double>
>  child 0, ais_type: string
>  child 1, course: double
>  child 2, draught: double
>  child 3, draught_raw: null
>  child 4, heading: double
>  child 5, lat: double
>  child 6, lon: double
>  child 7, nav_state: int64
>  child 8, received_time: timestamp[s]
>  child 9, speed: double
> provider_name: string
> vessel: struct null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, 
> imo: string, length: null, mmsi: string, name: string, type: null, 
> vessel_type: string>
>  child 0, beam: null
>  child 1, build_year: null
>  child 2, call_sign: string
>  child 3, dead_weight: null
>  child 4, dwt: null
>  child 5, flag_code: null
>  child 6, flag_name: string
>  child 7, gross_tonnage: null
>  child 8, imo: string
>  child 9, length: null
>  child 10, mmsi: string
>  child 11, name: string
>  child 12, type: null
>  child 13, vessel_type: string
> source_provider: string
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6761) [Rust] Travis CI builds not respecting rust-toolchain

2019-10-02 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-6761:
--
Summary: [Rust] Travis CI builds not respecting rust-toolchain  (was: 
[Rust] Builds failing due to Rust Internal Compiler Error)

> [Rust] Travis CI builds not respecting rust-toolchain
> -
>
> Key: ARROW-6761
> URL: https://issues.apache.org/jira/browse/ARROW-6761
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Travis builds recently started failing with a Rust ICE (Internal Compiler 
> Error) which has been reported to the Rust compiler team 
> ([https://github.com/rust-lang/rust/issues/64908]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6730) [CI] Use GitHub Actions for "C++ with clang 7" docker image

2019-10-02 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-6730.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5530
[https://github.com/apache/arrow/pull/5530]

> [CI] Use GitHub Actions for "C++ with clang 7" docker image
> ---
>
> Key: ARROW-6730
> URL: https://issues.apache.org/jira/browse/ARROW-6730
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-6730) [CI] Use GitHub Actions for "C++ with clang 7" docker image

2019-10-02 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-6730:
---

Assignee: Francois Saint-Jacques

> [CI] Use GitHub Actions for "C++ with clang 7" docker image
> ---
>
> Key: ARROW-6730
> URL: https://issues.apache.org/jira/browse/ARROW-6730
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-6730) [CI] Use GitHub Actions for "C++ with clang 7" docker image

2019-10-02 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-6730:

Summary: [CI] Use GitHub Actions for "C++ with clang 7" docker image  (was: 
[CI] Use Github Actions for "C++ with clang 7" docker image)

> [CI] Use GitHub Actions for "C++ with clang 7" docker image
> ---
>
> Key: ARROW-6730
> URL: https://issues.apache.org/jira/browse/ARROW-6730
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 103 matches

Mail list logo