Re: [DISCUSS] Rethinking our approach to scheduling CPU and IO work in C++?

2020-09-29 Thread Pierre Belzile
Hi, Some thoughts: 1. For async IO, the system must have threads that quickly service the callback. Otherwise the S3/GCS end will close the connection. A single thread pool where all the threads are doing an expensive compute operation (like CSV decoding or regex matching) can starve the IO. 2.

Re: Arrow Flight + Go, Arrow for Realtime

2020-08-15 Thread Pierre Belzile
Mark, Dis you take a look at finos perspective? It seems to have some interesting overlaps with your goals. I've come across it but have not digged in. Be curious to get your thoughts on it . Cheers On Sat., Aug. 15, 2020, 13:05 , wrote: > David, > > Still investigating, but I suspect for

float 16

2020-06-08 Thread Pierre Belzile
Hi, There seems to be two competing standards for floats with 16 bits: - https://en.wikipedia.org/wiki/Bfloat16_floating-point_format - IEEE: https://en.wikipedia.org/wiki/IEEE_754-2008_revision Was there any thought on how this could be handled? Would it make sense to add some kind of

[jira] [Created] (ARROW-8657) Distinguish parquet version 2 logical type vs DataPageV2

2020-04-30 Thread Pierre Belzile (Jira)
Pierre Belzile created ARROW-8657: - Summary: Distinguish parquet version 2 logical type vs DataPageV2 Key: ARROW-8657 URL: https://issues.apache.org/jira/browse/ARROW-8657 Project: Apache Arrow

Re: parquet 2 incompatibility between 0.16 and 0.17?

2020-04-29 Thread Pierre Belzile
> "compatibility mode" option regarding the ConvertedType/LogicalType > annotations and the behavior around conversions when writing unsigned > integers, nanosecond timestamps, and other types to Parquet V1 (which > is the only "production" Parquet format). > &g

parquet 2 incompatibility between 0.16 and 0.17?

2020-04-29 Thread Pierre Belzile
Hi, We've been using the parquet 2 format (mostly because of nanosecond resolution). I'm getting crashes in the C++ parquet decoder, arrow 0.16, when decoding a parquet 2 file created with pyarrow 0.17.0. Is this expected? Would a 0.17 decode a 0.16? If that's not expected, I can put the

[jira] [Created] (ARROW-8006) unsafe arrow dictionary recovered from parquet

2020-03-04 Thread Pierre Belzile (Jira)
Pierre Belzile created ARROW-8006: - Summary: unsafe arrow dictionary recovered from parquet Key: ARROW-8006 URL: https://issues.apache.org/jira/browse/ARROW-8006 Project: Apache Arrow Issue

Re: Crash with 0.15.1 when transposing dicts with nulls values

2020-02-29 Thread Pierre Belzile
cisely are you > doing with the data that is failing? > > On Fri, Feb 28, 2020 at 4:57 PM Pierre Belzile > wrote: > > > > When I recover an array of type dictionary int32 -> string from a parquet > > file and that array has null positions, it seems that the indices that > &

Crash with 0.15.1 when transposing dicts with nulls values

2020-02-28 Thread Pierre Belzile
When I recover an array of type dictionary int32 -> string from a parquet file and that array has null positions, it seems that the indices that correspond to null positions are undefined. I.e. not guaranteed to be 0. This causes a crash when using a transpose map when trying to read the transpose

[jira] [Created] (ARROW-7376) parquet NaN/null double statistics can result in endless loop

2019-12-11 Thread Pierre Belzile (Jira)
Pierre Belzile created ARROW-7376: - Summary: parquet NaN/null double statistics can result in endless loop Key: ARROW-7376 URL: https://issues.apache.org/jira/browse/ARROW-7376 Project: Apache Arrow