Thank you Gopal for the explanation !
Is there any JIRA ticket that tracks the String dictionary encoding ? :) --
I can see it has huge potential values, for example map type is used as a
flexible struct :)
On Wed, Nov 30, 2016 at 11:51 PM, Gopal Vijayaraghavan <gop...@apache.org>
> > I am curious about how is map type serialized in ORC files? -- One
> simple guess would be (conceptually) storing two arrays for key and value,
> respectively :).
> Close enough, it's stored as 3 streams.
> childrenWriters.writeBatch(vec.keys, childOffset,
> childrenWriters.writeBatch(vec.values, childOffset,
> The first stream stores the cardinality of the map and the other two
> stores keys and values in whatever type they might represent, in sequence
> (to be read out & zipped to tuples).
> Actually, I've yet to confirm if the String dictionary encoding kicks in
> for the key streams - usually for things like attributes, the keys repeat
> in massive numbers for that to be a useful optimization.
Wenlei Xie (谢文磊)