Hi Alexey, I don’t think UPSERT works in this case, for example We have a table that “key” field is the unique id for each record, but in order to use range partition, we have to use key+day as primary key.
First record (key1, day1,value1) arrives and is inserted into table Later on, another record (key1, day2,value2) arrives. If we use UPSERT there will be two records in the table (key1, day1,value1) and (key1, day2,value2) (day1 changed to day2 because time changes to the next day) What we want is one record (key1, day1,value2). Regards Ray From: Alexey Serbin <aser...@cloudera.com> Reply-To: "user@kudu.apache.org" <user@kudu.apache.org> Date: Thursday, May 7, 2020 at 05:13 To: "user@kudu.apache.org" <user@kudu.apache.org> Subject: Re: Why does partition keys have to be in the primary keys? Hi, The restriction on the partitioning key to be composed of primary key columns significantly simplifies the design and implementation. However, I'm not sure I understand why the rules of partitioning come to play here. To me it looks like the main question is about the schema for the table, i.e. what should be the primary key. If different pipelines use different values for the 'day' field, but one result record is expected, does it imply that pipelines need to update already existing records? If so, then maybe use UPSERT instead of INSERT for those pipelines? I would start with trying to understand what's the primary key for the table to satisfy the requirements. Once it's clear, I'd think about the partitioning rules for the table. Thanks, Alexey On Wed, May 6, 2020 at 4:54 AM Ray Liu (rayliu) <ray...@cisco.com<mailto:ray...@cisco.com>> wrote: We have two pipelines writing to the same table, and that table is ranged partitioned by “day” field. Each pipeline fills some of the fields in the table with the same key. But the “day” field in these two pipelines may be different. Because range partition keys must exist in primary keys, so there will be two records in the result table. What we want is one complete record. So my question is why does partition keys have to be in the primary keys? Is there any workaround for this?