There's a relevant Jira issue here (maybe some others), if someone
wants to pick it up and write a kernel for it
https://issues.apache.org/jira/browse/ARROW-4097
I think having an improved experience around this dictionary
conformance/normalization problem would be valuable.
On Tue, May 31,
I don't think you are missing anything. The parquet encoding is baked
into the data on the disk so re-encoding at some stage is inevitable.
Re-encoding in python like you are doing is going to be inefficient.
I think you will want to do the re-encoding in C++. Unfortunately, I
don't think we
Hi,
Background:
I have a need to optimize read speed for few-column lookups in large
datasets. Currently I have the data in Plasma to have fast reading of it,
but Plasma is cumbersome to manage when the data frequently changes (and
“locks” the ram). Instead I’m trying to figure out a fast-enough