OK, thanks, I will work with delta dictionaries. How do delta dictionaries solve the random access issue?
On Tue, Feb 22, 2022 at 9:51 AM Micah Kornfield <[email protected]> wrote: > Dictionary replacement isn't supported in the file format because the > metadata makes it difficult to associate a particular dictionary with a > record batch for Random access. > > Delta dictionaries are supported but there was a long standing bug that > prevented there use in Python ( > https://issues.apache.org/jira/browse/ARROW-13467). If you are still > seeing issues in pyarrow 7.0 please open a bug. > > In regards to the usefulness of the file format without these features > that is really use case dependent. > > Cheers, > Micah > > On Tuesday, February 22, 2022, Chris Nuernberger <[email protected]> > wrote: > >> How are dictionaries intended to be used in a file with multiple record >> batches? >> >> I tried saving record-batch-specific dictionaries and got this error from >> python: >> >> > pyarrow.lib.ArrowInvalid: Unsupported dictionary replacement or >> dictionary delta in IPC file >> >> This seems to defeat the purpose of having multiple record batches in a >> single arrow file; the work around appears to be to either preprocess the >> entire sequence of datasets to unify the dictionaries or save multiple >> arrow files. >> >
