This seems like a bug or a miss. I opened:
https://issues.apache.org/jira/browse/ARROW-11353 to track a fix.
On Sun, Jan 17, 2021 at 9:18 PM Steve Kim wrote:
> > This should be possible already, at least on git master but perhaps also
> > in 2.0.0. Which problem are you encountering?
>
> With
> This should be possible already, at least on git master but perhaps also
> in 2.0.0. Which problem are you encountering?
With pyarrow 2.0.0, I encountered the following:
```
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> import pyarrow.dataset as ds
>>> pa.__version__
'2.0.0'
>>
This should be possible already, at least on git master but perhaps also
in 2.0.0. Which problem are you encountering?
Le 09/01/2021 à 05:27, Steve Kim a écrit :
> Is it possible to read Parquet columns into an Arrow schema that has
> variable-width types with 64-bit offsets (LargeBinary, Larg
Is it possible to read Parquet columns into an Arrow schema that has
variable-width types with 64-bit offsets (LargeBinary, LargeList, etc.)?
For my current use case, I prefer the large types because the data overflow
32-bit offsets, and it is easier to waste memory with 8 bytes per offset
than it