Hi Alexey,

I don’t think UPSERT works in this case, for example
We have a table that “key” field is the unique id for each record, but in order 
to use range partition, we have to use key+day as primary key.

First record (key1, day1,value1) arrives and is inserted into table
Later on, another record (key1, day2,value2) arrives.
If we use UPSERT there will be two records in the table
(key1, day1,value1) and (key1, day2,value2)
(day1 changed to day2 because time changes to the next day)

What we want is one record (key1, day1,value2).

Regards
Ray
From: Alexey Serbin <aser...@cloudera.com>
Reply-To: "user@kudu.apache.org" <user@kudu.apache.org>
Date: Thursday, May 7, 2020 at 05:13
To: "user@kudu.apache.org" <user@kudu.apache.org>
Subject: Re: Why does partition keys have to be in the primary keys?

Hi,

The restriction on the partitioning key to be composed of primary key columns 
significantly simplifies the design and implementation.

However, I'm not sure I understand why the rules of partitioning come to play 
here.  To me it looks like the main question is about the schema for the table, 
i.e. what should be the primary key.  If different pipelines use different 
values for the 'day' field, but one result record is expected, does it imply 
that pipelines need to update already existing records?  If so, then maybe use 
UPSERT instead of INSERT for those pipelines?

I would start with trying to understand what's the primary key for the table to 
satisfy the requirements.  Once it's clear, I'd think about the partitioning 
rules for the table.


Thanks,

Alexey

On Wed, May 6, 2020 at 4:54 AM Ray Liu (rayliu) 
<ray...@cisco.com<mailto:ray...@cisco.com>> wrote:
We have two pipelines writing to the same table, and that table is ranged 
partitioned by “day” field.

Each pipeline fills some of the fields in the table with the same key.

But the “day” field in these two pipelines may be different.

Because range partition keys must exist in primary keys, so there will be two 
records in the result table.

What we want is one complete record.

So my question is why does partition keys have to be in the primary keys?

Is there any workaround for this?

Reply via email to