Hi,
Some thoughts:
1. For async IO, the system must have threads that quickly service the
callback. Otherwise the S3/GCS end will close the connection. A single
thread pool where all the threads are doing an expensive compute operation
(like CSV decoding or regex matching) can starve the IO.
2.
Mark,
Dis you take a look at finos perspective? It seems to have some interesting
overlaps with your goals. I've come across it but have not digged in.
Be curious to get your thoughts on it .
Cheers
On Sat., Aug. 15, 2020, 13:05 , wrote:
> David,
>
> Still investigating, but I suspect for
Hi,
There seems to be two competing standards for floats with 16 bits:
- https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
- IEEE: https://en.wikipedia.org/wiki/IEEE_754-2008_revision
Was there any thought on how this could be handled? Would it make sense to
add some kind of
Pierre Belzile created ARROW-8657:
-
Summary: Distinguish parquet version 2 logical type vs DataPageV2
Key: ARROW-8657
URL: https://issues.apache.org/jira/browse/ARROW-8657
Project: Apache Arrow
> "compatibility mode" option regarding the ConvertedType/LogicalType
> annotations and the behavior around conversions when writing unsigned
> integers, nanosecond timestamps, and other types to Parquet V1 (which
> is the only "production" Parquet format).
>
&g
Hi,
We've been using the parquet 2 format (mostly because of nanosecond
resolution). I'm getting crashes in the C++ parquet decoder, arrow 0.16,
when decoding a parquet 2 file created with pyarrow 0.17.0. Is this
expected? Would a 0.17 decode a 0.16?
If that's not expected, I can put the
Pierre Belzile created ARROW-8006:
-
Summary: unsafe arrow dictionary recovered from parquet
Key: ARROW-8006
URL: https://issues.apache.org/jira/browse/ARROW-8006
Project: Apache Arrow
Issue
cisely are you
> doing with the data that is failing?
>
> On Fri, Feb 28, 2020 at 4:57 PM Pierre Belzile
> wrote:
> >
> > When I recover an array of type dictionary int32 -> string from a parquet
> > file and that array has null positions, it seems that the indices that
> &
When I recover an array of type dictionary int32 -> string from a parquet
file and that array has null positions, it seems that the indices that
correspond to null positions are undefined. I.e. not guaranteed to be 0.
This causes a crash when using a transpose map when trying to read the
transpose
Pierre Belzile created ARROW-7376:
-
Summary: parquet NaN/null double statistics can result in endless
loop
Key: ARROW-7376
URL: https://issues.apache.org/jira/browse/ARROW-7376
Project: Apache Arrow
10 matches
Mail list logo