Re: Composite primary key

Todd Lipcon Tue, 05 Sep 2017 16:39:27 -0700

Hi Janne,

This is a good interesting question.

If you never plan on actually querying based on those columns themselves,
concatenating them into a binary column as the single PK will save a bit of
space relative to storing them separately. In the case of a composite
primary key, Kudu will internally encode a binary concatenated column and
store it using prefix encoding. So, if you store them separately, you'll
get the same composite binary encoding plus the additional storage for the
separate columns.

However, if you have any use case for querying based on them, having the
separate columns would be quite useful, since Kudu can push down predicates
to individual columns.

Being able to use the subfields for partitioning is also likely to be
useful - eg you might want to hash-partition on 'topic+partition' together
so that all data for a given topic always ends up stored together. This
wouldn't be possible if you use a combined (manually-encoded) key.

-Todd

On Fri, Aug 25, 2017 at 11:10 PM, Janne Keskitalo <[email protected]>
wrote:

> Hi
>
> We're inserting messages from kafka into kudu tables and some messages
> don't have a natural primary key, hence we decided to use kafka
> topic/partition/offset -combination as the key. Is it better to concatenate
> the fields into one kudu column or create a separate column for each? Do we
> get better compression if using individual columns? And is the PK index
> structure maintained outside of the actual table data?
>
> --
> Br.
> Janne Keskitalo,
> Database Architect, PAF.COM
> For support: [email protected]
>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Composite primary key

Reply via email to