Dictionary replacement isn't supported in the file format because the
metadata makes it difficult to associate a particular dictionary with a
record batch for Random access.

Delta dictionaries are supported but there was a long standing bug that
prevented there use in Python (
https://issues.apache.org/jira/browse/ARROW-13467).  If you are still
seeing issues in pyarrow 7.0 please open a bug.

In regards to the usefulness of the file format without these features that
is really use case dependent.

Cheers,
Micah

On Tuesday, February 22, 2022, Chris Nuernberger <[email protected]>
wrote:

> How are dictionaries intended to be used in a file with multiple record
> batches?
>
> I tried saving record-batch-specific dictionaries and got this error from
> python:
>
>  > pyarrow.lib.ArrowInvalid: Unsupported dictionary replacement or
> dictionary delta in IPC file
>
> This seems to defeat the purpose of having multiple record batches in a
> single arrow file; the work around appears to be to either preprocess the
> entire sequence of datasets to unify the dictionaries or save multiple
> arrow files.
>

Reply via email to