[jira] [Created] (ARROW-9033) [Python] Add tests to verify that one can build a C++ extension against the manylinux1 wheels

2020-06-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9033: --- Summary: [Python] Add tests to verify that one can build a C++ extension against the manylinux1 wheels Key: ARROW-9033 URL: https://issues.apache.org/jira/browse/ARROW-9033

[jira] [Created] (ARROW-9032) [C++] Split arrow/util/bit_util.h into multiple header files

2020-06-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9032: --- Summary: [C++] Split arrow/util/bit_util.h into multiple header files Key: ARROW-9032 URL: https://issues.apache.org/jira/browse/ARROW-9032 Project: Apache Arrow

[jira] [Created] (ARROW-9031) [R] Implement conversion from Type::UINT64 to R vector

2020-06-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9031: --- Summary: [R] Implement conversion from Type::UINT64 to R vector Key: ARROW-9031 URL: https://issues.apache.org/jira/browse/ARROW-9031 Project: Apache Arrow

[jira] [Created] (ARROW-9030) [Python] Clean up some usages of pyarrow.compat, move some common functions/symbols to lib.pyx

2020-06-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9030: --- Summary: [Python] Clean up some usages of pyarrow.compat, move some common functions/symbols to lib.pyx Key: ARROW-9030 URL: https://issues.apache.org/jira/browse/ARROW-9030

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney
On Wed, Jun 3, 2020 at 11:25 AM Krisztián Szűcs wrote: > > On Wed, Jun 3, 2020 at 6:16 PM Krisztián Szűcs > wrote: > > > > On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney wrote: > > > > > > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs > > > wrote: > > > > > > > > From the user perspective I

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney
On Wed, Jun 3, 2020 at 11:16 AM Krisztián Szűcs wrote: > > On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney wrote: > > > > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs > > wrote: > > > > > > From the user perspective I find the following pretty confusing: > > > > > > In [1]: np.array([-128, 127],

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Krisztián Szűcs
On Wed, Jun 3, 2020 at 6:16 PM Krisztián Szűcs wrote: > > On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney wrote: > > > > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs > > wrote: > > > > > > From the user perspective I find the following pretty confusing: > > > > > > In [1]: np.array([-128, 127],

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Krisztián Szűcs
On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney wrote: > > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs > wrote: > > > > From the user perspective I find the following pretty confusing: > > > > In [1]: np.array([-128, 127], dtype=np.int8()) * 2 > > Out[1]: array([ 0, -2], dtype=int8) > > > > In

[jira] [Created] (ARROW-9029) [C++] Implement BitmapScanner interface to accelerate processing of mostly-not-null data

2020-06-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9029: --- Summary: [C++] Implement BitmapScanner interface to accelerate processing of mostly-not-null data Key: ARROW-9029 URL: https://issues.apache.org/jira/browse/ARROW-9029

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney
On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs wrote: > > From the user perspective I find the following pretty confusing: > > In [1]: np.array([-128, 127], dtype=np.int8()) * 2 > Out[1]: array([ 0, -2], dtype=int8) > > In [2]: np.array([-128, 127], dtype=np.int16()) * 2 > Out[2]: array([-256,

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Krisztián Szűcs
>From the user perspective I find the following pretty confusing: In [1]: np.array([-128, 127], dtype=np.int8()) * 2 Out[1]: array([ 0, -2], dtype=int8) In [2]: np.array([-128, 127], dtype=np.int16()) * 2 Out[2]: array([-256, 254], dtype=int16) In my opinion somewhere (on a higher level maybe)

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney
On Wed, Jun 3, 2020 at 10:44 AM Wes McKinney wrote: > > > By default an error should probably be raised > > I would very strongly recommend keeping the behavior consistent with > that of analytic DBMSes. I don't think that most analytic DBMS error > on overflows because it's too computationally

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney
> By default an error should probably be raised I would very strongly recommend keeping the behavior consistent with that of analytic DBMSes. I don't think that most analytic DBMS error on overflows because it's too computationally expensive to check. NumPy doesn't error (by default at least)

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Antoine Pitrou
On Wed, 3 Jun 2020 10:47:38 -0400 Ben Kietzman wrote: > https://github.com/apache/arrow/pull/7341#issuecomment-638241193 > > How should arithmetic kernels handle integer overflow? > > The approach currently taken in the linked PR is to promote such that > overflow will not occur, for example

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney
What do open source analytic database systems do? I don't think we should deviate from the behavior of these systems. For example, you can see that Apache Impala uses unsigned arithmetic on signed integers

Re: Writing Parquet datasets using pyarrow.parquet.ParquetWriter

2020-06-03 Thread Joris Van den Bossche
Hi Palak, The ParquetWriter class is meant to write a single parquet file (so in that sense, that you see only a single parquet file being written based on the shown code, that is expected). If you want to write multiple files, you can either manually create multiple ParquetWriter instances

[DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Ben Kietzman
https://github.com/apache/arrow/pull/7341#issuecomment-638241193 How should arithmetic kernels handle integer overflow? The approach currently taken in the linked PR is to promote such that overflow will not occur, for example `(int8, int8)->int16` and `(uint16, uint16)->uint32`. I'm not sure

Re: [DISCUSS] Adding "byteWidth" field to Decimal Flatbuffers type for forward compatibility

2020-06-03 Thread Antoine Pitrou
Sounds good to me. Regards Antoine. On Mon, 1 Jun 2020 17:47:38 -0500 Wes McKinney wrote: > I mentioned this on the recent sync call and opened > > https://issues.apache.org/jira/browse/ARROW-8985 > > I believe at some point that Arrow may need to be used to transport > decimal widths

[jira] [Created] (ARROW-9028) [R] Should be able to convert an empty table

2020-06-03 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9028: - Summary: [R] Should be able to convert an empty table Key: ARROW-9028 URL: https://issues.apache.org/jira/browse/ARROW-9028 Project: Apache Arrow

[jira] [Created] (ARROW-9027) [Python] Split in multiple files + clean-up pyarrow.parquet tests

2020-06-03 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-9027: Summary: [Python] Split in multiple files + clean-up pyarrow.parquet tests Key: ARROW-9027 URL: https://issues.apache.org/jira/browse/ARROW-9027

[jira] [Created] (ARROW-9026) [C++/Python] Force package removal from arrow-nightlies conda repository

2020-06-03 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9026: --- Summary: [C++/Python] Force package removal from arrow-nightlies conda repository Key: ARROW-9026 URL: https://issues.apache.org/jira/browse/ARROW-9026 Project: Apache Arrow

[jira] [Created] (ARROW-9025) Apache arrow fails to build with recent version of protobuf?

2020-06-03 Thread Keith Hughitt (Jira)
Keith Hughitt created ARROW-9025: Summary: Apache arrow fails to build with recent version of protobuf? Key: ARROW-9025 URL: https://issues.apache.org/jira/browse/ARROW-9025 Project: Apache Arrow

[jira] [Created] (ARROW-9024) [C++/Python] Install anaconda-client in conda-clean job

2020-06-03 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9024: --- Summary: [C++/Python] Install anaconda-client in conda-clean job Key: ARROW-9024 URL: https://issues.apache.org/jira/browse/ARROW-9024 Project: Apache Arrow Issue

[jira] [Created] (ARROW-9023) [C++] Use mimalloc conda package

2020-06-03 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9023: --- Summary: [C++] Use mimalloc conda package Key: ARROW-9023 URL: https://issues.apache.org/jira/browse/ARROW-9023 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-9022) [C++][Compute] Make Add function safe for numeric limits

2020-06-03 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-9022: -- Summary: [C++][Compute] Make Add function safe for numeric limits Key: ARROW-9022 URL: https://issues.apache.org/jira/browse/ARROW-9022 Project: Apache Arrow

[NIGHTLY] Arrow Build Report for Job nightly-2020-06-03-0

2020-06-03 Thread Crossbow
Arrow Build Report for Job nightly-2020-06-03-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0 Failed Tasks: - centos-8-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-centos-8-aarch64 -

[jira] [Created] (ARROW-9021) [Python] The filesystem keyword in parquet.read_table is not documented

2020-06-03 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-9021: Summary: [Python] The filesystem keyword in parquet.read_table is not documented Key: ARROW-9021 URL: https://issues.apache.org/jira/browse/ARROW-9021