Dictionary replacement isn't supported in the file format because the metadata makes it difficult to associate a particular dictionary with a record batch for Random access.
Delta dictionaries are supported but there was a long standing bug that prevented there use in Python ( https://issues.apache.org/jira/browse/ARROW-13467). If you are still seeing issues in pyarrow 7.0 please open a bug. In regards to the usefulness of the file format without these features that is really use case dependent. Cheers, Micah On Tuesday, February 22, 2022, Chris Nuernberger <[email protected]> wrote: > How are dictionaries intended to be used in a file with multiple record > batches? > > I tried saving record-batch-specific dictionaries and got this error from > python: > > > pyarrow.lib.ArrowInvalid: Unsupported dictionary replacement or > dictionary delta in IPC file > > This seems to defeat the purpose of having multiple record batches in a > single arrow file; the work around appears to be to either preprocess the > entire sequence of datasets to unify the dictionaries or save multiple > arrow files. >
