OK, thanks, I will work with delta dictionaries.

How do delta dictionaries solve the random access issue?

On Tue, Feb 22, 2022 at 9:51 AM Micah Kornfield <[email protected]>
wrote:

> Dictionary replacement isn't supported in the file format because the
> metadata makes it difficult to associate a particular dictionary with a
> record batch for Random access.
>
> Delta dictionaries are supported but there was a long standing bug that
> prevented there use in Python (
> https://issues.apache.org/jira/browse/ARROW-13467).  If you are still
> seeing issues in pyarrow 7.0 please open a bug.
>
> In regards to the usefulness of the file format without these features
> that is really use case dependent.
>
> Cheers,
> Micah
>
> On Tuesday, February 22, 2022, Chris Nuernberger <[email protected]>
> wrote:
>
>> How are dictionaries intended to be used in a file with multiple record
>> batches?
>>
>> I tried saving record-batch-specific dictionaries and got this error from
>> python:
>>
>>  > pyarrow.lib.ArrowInvalid: Unsupported dictionary replacement or
>> dictionary delta in IPC file
>>
>> This seems to defeat the purpose of having multiple record batches in a
>> single arrow file; the work around appears to be to either preprocess the
>> entire sequence of datasets to unify the dictionaries or save multiple
>> arrow files.
>>
>

Reply via email to