Re: [C++] read Parquet columns into 64-bit offset types

2021-01-22 Thread Micah Kornfield
This seems like a bug or a miss. I opened: https://issues.apache.org/jira/browse/ARROW-11353 to track a fix. On Sun, Jan 17, 2021 at 9:18 PM Steve Kim wrote: > > This should be possible already, at least on git master but perhaps also > > in 2.0.0. Which problem are you encountering? > > With

Re: [C++] read Parquet columns into 64-bit offset types

2021-01-17 Thread Steve Kim
> This should be possible already, at least on git master but perhaps also > in 2.0.0. Which problem are you encountering? With pyarrow 2.0.0, I encountered the following: ``` >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import pyarrow.dataset as ds >>> pa.__version__ '2.0.0' >>

Re: [C++] read Parquet columns into 64-bit offset types

2021-01-09 Thread Antoine Pitrou
This should be possible already, at least on git master but perhaps also in 2.0.0. Which problem are you encountering? Le 09/01/2021 à 05:27, Steve Kim a écrit : > Is it possible to read Parquet columns into an Arrow schema that has > variable-width types with 64-bit offsets (LargeBinary, Larg

[C++] read Parquet columns into 64-bit offset types

2021-01-08 Thread Steve Kim
Is it possible to read Parquet columns into an Arrow schema that has variable-width types with 64-bit offsets (LargeBinary, LargeList, etc.)? For my current use case, I prefer the large types because the data overflow 32-bit offsets, and it is easier to waste memory with 8 bytes per offset than it