Arrow will try to pass along the dictionary directly to parquet so if data is already in that form it could be faster. But the first encoding attempted for raw types is dictionary encoding so if cardinality is low enough you should end up with similar space savings.
On Thursday, June 23, 2022, Kirby, Adam <[email protected]> wrote: > For writing Parquet using ParquetWriter, is there any particular advantage > (in terms of ending up with a compact dictionary encoded column being > efficiently written in Parquet) to starting with a DictionaryArray versus a > simpler type? > > Thank you! >
