[jira] [Created] (ARROW-9040) [Python][Parquet]"_ParquetDatasetV2" fail to read with columns and use_pandas_metadata=True

2020-06-04 Thread cmsxbc (Jira)
cmsxbc created ARROW-9040: - Summary: [Python][Parquet]"_ParquetDatasetV2" fail to read with columns and use_pandas_metadata=True Key: ARROW-9040 URL: https://issues.apache.org/jira/browse/ARROW-9040 Project:

Unsubcribe

2020-06-04 Thread Zhuo Jia Dai
-- ZJ zhuojia@gmail.com

[jira] [Created] (ARROW-9039) py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent versions

2020-06-04 Thread Yoav Git (Jira)
Yoav Git created ARROW-9039: --- Summary: py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent versions Key: ARROW-9039 URL: https://issues.apache.org/jira/browse/ARROW-9039 Project:

[jira] [Created] (ARROW-9038) [C++] Improve BitBlockCounter

2020-06-04 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-9038: --- Summary: [C++] Improve BitBlockCounter Key: ARROW-9038 URL: https://issues.apache.org/jira/browse/ARROW-9038 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)

2020-06-04 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-9037: Summary: [C++/C-ABI] unable to import array with null count == -1 (which could be exported) Key: ARROW-9037 URL: https://issues.apache.org/jira/browse/ARROW-9037 Project:

[jira] [Created] (ARROW-9036) Null pointer exception when caching data frames)

2020-06-04 Thread Gaurangi Saxena (Jira)
Gaurangi Saxena created ARROW-9036: -- Summary: Null pointer exception when caching data frames) Key: ARROW-9036 URL: https://issues.apache.org/jira/browse/ARROW-9036 Project: Apache Arrow

Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-04 Thread Antoine Pitrou
Le 04/06/2020 à 18:11, Rémi Dettai a écrit : > > Ideally, we should be able to presize the array to a good enough > estimate. > You should be able to get away with a correct estimation because parquet > column metadata contains the uncompressed size. But is their anything wrong > with this idea

Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-04 Thread Rémi Dettai
> Ideally, we should be able to presize the array to a good enough estimate. You should be able to get away with a correct estimation because parquet column metadata contains the uncompressed size. But is their anything wrong with this idea of mmaping huge "runways" for our larger allocations ?

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-04 Thread Francois Saint-Jacques
I documented [1] the behaviors by experimentation or by reading the documentation. My experiments were mostly about checking INT64_MAX + 1. My preference would be to use the platform defined behavior by default and provide a safety option that errors. Feel free to add more databases/systems.

Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-04 Thread Antoine Pitrou
On Thu, 4 Jun 2020 17:48:16 +0200 Rémi Dettai wrote: > When creating large arrays, Arrow uses realloc quite intensively. > > I have an example where y read a gzipped parquet column (strings) that > expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc > cannot anticipate this and

[DISCUSS] [C++] custom allocator for large objects

2020-06-04 Thread Rémi Dettai
When creating large arrays, Arrow uses realloc quite intensively. I have an example where y read a gzipped parquet column (strings) that expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc cannot anticipate this and every reallocate call above 1MB (the most critical ones) ends

[jira] [Created] (ARROW-9035) 8 vs 64 byte alignment

2020-06-04 Thread Anthony Abate (Jira)
Anthony Abate created ARROW-9035: Summary: 8 vs 64 byte alignment Key: ARROW-9035 URL: https://issues.apache.org/jira/browse/ARROW-9035 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-9034) [C++] Implement binary (two bitmap) version of BitBlockCounter

2020-06-04 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9034: --- Summary: [C++] Implement binary (two bitmap) version of BitBlockCounter Key: ARROW-9034 URL: https://issues.apache.org/jira/browse/ARROW-9034 Project: Apache Arrow

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-04 Thread Wes McKinney
On Thu, Jun 4, 2020 at 4:57 AM Krisztián Szűcs wrote: > > On Thu, Jun 4, 2020 at 11:09 AM Rémi Dettai wrote: > > > > It makes sense to me that the default behaviour of such a low level api as > > kernel does not do any automagic promotion, but shouldn't this kind of > > promotion still be

[NIGHTLY] Arrow Build Report for Job nightly-2020-06-04-0

2020-06-04 Thread Crossbow
Arrow Build Report for Job nightly-2020-06-04-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-04-0 Failed Tasks: - centos-7-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-04-0-travis-centos-7-aarch64 -

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-04 Thread Krisztián Szűcs
On Thu, Jun 4, 2020 at 11:09 AM Rémi Dettai wrote: > > It makes sense to me that the default behaviour of such a low level api as > kernel does not do any automagic promotion, but shouldn't this kind of > promotion still be requestable by the so called "system developer" user ? > Otherwise he

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-04 Thread Rémi Dettai
It makes sense to me that the default behaviour of such a low level api as kernel does not do any automagic promotion, but shouldn't this kind of promotion still be requestable by the so called "system developer" user ? Otherwise he would need to materialize a promoted version of each original