[jira] [Created] (ARROW-8228) [C++][Parquet] Support writing lists that have null elements that are non-empty.

2020-03-25 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8228: -- Summary: [C++][Parquet] Support writing lists that have null elements that are non-empty. Key: ARROW-8228 URL: https://issues.apache.org/jira/browse/ARROW-8228

[jira] [Created] (ARROW-8227) [C++] Propose refining SIMD code framework

2020-03-25 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-8227: --- Summary: [C++] Propose refining SIMD code framework Key: ARROW-8227 URL: https://issues.apache.org/jira/browse/ARROW-8227 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-8226) [Go] Add binary builder that uses 64 bit offsets and make binary builders resettable

2020-03-25 Thread Richard (Jira)
Richard created ARROW-8226: -- Summary: [Go] Add binary builder that uses 64 bit offsets and make binary builders resettable Key: ARROW-8226 URL: https://issues.apache.org/jira/browse/ARROW-8226 Project:

[jira] [Created] (ARROW-8224) [C++] Remove APIs deprecated prior to 0.16.0

2020-03-25 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8224: --- Summary: [C++] Remove APIs deprecated prior to 0.16.0 Key: ARROW-8224 URL: https://issues.apache.org/jira/browse/ARROW-8224 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-8223) Schema.from_pandas breaks with pandas nullable integer dtype

2020-03-25 Thread Ged Steponavicius (Jira)
Ged Steponavicius created ARROW-8223: Summary: Schema.from_pandas breaks with pandas nullable integer dtype Key: ARROW-8223 URL: https://issues.apache.org/jira/browse/ARROW-8223 Project: Apache

[jira] [Created] (ARROW-8222) [C++] Use bcp to make a slim boost for bundled build

2020-03-25 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8222: -- Summary: [C++] Use bcp to make a slim boost for bundled build Key: ARROW-8222 URL: https://issues.apache.org/jira/browse/ARROW-8222 Project: Apache Arrow

Re: Preparing for 0.17.0 Arrow release

2020-03-25 Thread Andy Grove
I just took a first pass at reviewing the Java and Rust issues and removed some from the 0.17.0 release. There are a few small Rust issues that I am actively working on for this release. Thanks. On Wed, Mar 25, 2020 at 1:13 PM Wes McKinney wrote: > hi Neal, > > Thanks for helping coordinate.

Re: Preparing for 0.17.0 Arrow release

2020-03-25 Thread Wes McKinney
hi Neal, Thanks for helping coordinate. I agree we should be in a position to release sometime next week. Can folks from the Rust and Java side review issues in the backlog? According to the dashboard there are 19 Rust issues open and 7 Java issues. Thanks On Tue, Mar 24, 2020 at 10:01 AM Neal

[jira] [Created] (ARROW-8220) [Python] Make dataset FileFormat objects serializable

2020-03-25 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8220: Summary: [Python] Make dataset FileFormat objects serializable Key: ARROW-8220 URL: https://issues.apache.org/jira/browse/ARROW-8220 Project: Apache

[jira] [Created] (ARROW-8219) [Rust] sqlparser crate needs to be bumped to version 0.2.5

2020-03-25 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8219: -- Summary: [Rust] sqlparser crate needs to be bumped to version 0.2.5 Key: ARROW-8219 URL: https://issues.apache.org/jira/browse/ARROW-8219 Project: Apache Arrow

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-25 Thread Micah Kornfield
If it isn't hard could you run with batch sizes of 1024 or 2048 records? I think there was a question previously raised if there was benefit for smaller sizes buffers. Thanks, Micah On Wed, Mar 25, 2020 at 8:59 AM Wes McKinney wrote: > On Tue, Mar 24, 2020 at 9:22 PM Micah Kornfield >

[jira] [Created] (ARROW-8218) [C++] Parallelize decompression at field level in experimental IPC compression code

2020-03-25 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8218: --- Summary: [C++] Parallelize decompression at field level in experimental IPC compression code Key: ARROW-8218 URL: https://issues.apache.org/jira/browse/ARROW-8218

[jira] [Created] (ARROW-8217) [R][C++] Fix crashing data in test-dataset.R on 32-bit Windows from ARROW-7979

2020-03-25 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8217: --- Summary: [R][C++] Fix crashing data in test-dataset.R on 32-bit Windows from ARROW-7979 Key: ARROW-8217 URL: https://issues.apache.org/jira/browse/ARROW-8217 Project:

[jira] [Created] (ARROW-8216) filter method for Dataset doesn't distinguish between empty strings and NAs

2020-03-25 Thread Sam Albers (Jira)
Sam Albers created ARROW-8216: - Summary: filter method for Dataset doesn't distinguish between empty strings and NAs Key: ARROW-8216 URL: https://issues.apache.org/jira/browse/ARROW-8216 Project: Apache

[jira] [Created] (ARROW-8215) [CI][Glib] Meson install fails in the macOS build

2020-03-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8215: -- Summary: [CI][Glib] Meson install fails in the macOS build Key: ARROW-8215 URL: https://issues.apache.org/jira/browse/ARROW-8215 Project: Apache Arrow

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-25 Thread Wes McKinney
On Tue, Mar 24, 2020 at 9:22 PM Micah Kornfield wrote: > > > > > Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD on > > the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the Fannie Mae > > dataset. So that's a huge space savings > > One more question on this. What was the

[jira] [Created] (ARROW-8213) [Python][Dataste] Opening a dataset with a local incorrect path gives confusing error message

2020-03-25 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8213: Summary: [Python][Dataste] Opening a dataset with a local incorrect path gives confusing error message Key: ARROW-8213 URL:

[jira] [Created] (ARROW-8212) [Python][Dataset] Consider adding Cast like operation

2020-03-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8212: -- Summary: [Python][Dataset] Consider adding Cast like operation Key: ARROW-8212 URL: https://issues.apache.org/jira/browse/ARROW-8212 Project: Apache Arrow

[jira] [Created] (ARROW-8211) [C++] Sanitize hdfs host when creating HadoopFileSystem from endpoint

2020-03-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8211: -- Summary: [C++] Sanitize hdfs host when creating HadoopFileSystem from endpoint Key: ARROW-8211 URL: https://issues.apache.org/jira/browse/ARROW-8211 Project:

[jira] [Created] (ARROW-8209) [Python] Accessing duplicate column of Table by name gives wrong error

2020-03-25 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8209: Summary: [Python] Accessing duplicate column of Table by name gives wrong error Key: ARROW-8209 URL: https://issues.apache.org/jira/browse/ARROW-8209

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-25-0

2020-03-25 Thread Crossbow
Arrow Build Report for Job nightly-2020-03-25-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-25-0 Failed Tasks: - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-25-0-travis-gandiva-jar-trusty -

[jira] [Created] (ARROW-8208) [PYTHON] RowGroup filtering with ParquetDataset

2020-03-25 Thread Christophe Clienti (Jira)
Christophe Clienti created ARROW-8208: - Summary: [PYTHON] RowGroup filtering with ParquetDataset Key: ARROW-8208 URL: https://issues.apache.org/jira/browse/ARROW-8208 Project: Apache Arrow

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-25 Thread Sebastien Binet
On Wed, Mar 25, 2020 at 2:32 AM Wes McKinney wrote: > From what I've found searching on the internet > > - Java: > * ZSTD -- JNI-based library available > * LZ4 -- both JNI and native Java available > > - Go: ZSTD is a C binding, while there is an LZ4 native Go implementation > AFAIK, one has