Hi Saeid,

It's not based on the number of distinct values, but rather on the combined
size of the values. I believe the default is 256kb, so assuming your
strings are pretty short, a few thousand are likely to be able to be
dict-encoded. Note that dictionaries are calculated per-rowset (small chunk
of data) so even if your overall cardinality is much larger, if you have
some spatial locality such that rows with nearby primary keys have fewer
distinct values, then you're likely to get benefit here.

-Todd

On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari <[email protected]>
wrote:

> Hi Kudu community,
>
> Does any body know what is the maximum distinct values of a String column
> that Kudu considers in order to set its encoding to Dictionary? Many thanks
> :)
>
> br,
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to