Hi Paul, Sorry for the slow followup, been pulled a few different ways with the upcoming 1.3 release. The issue you run into is KUDU-1792 <https://issues.apache.org/jira/browse/KUDU-1792>, which was fixed in Kudu 1.2. KUDU-1792 only comes into play when adding a range partition where either the upper or lower bound is unbounded, but this is actually the case in your repro example due to a copy/paste error where the lower limit is being set twice and the upper limit is not being set. I think the fix is to upgrade to Kudu 1.2 and recreate the table if it still has the buggy partitions. Thanks again for the report!
- Dan On Tue, Feb 28, 2017 at 1:03 PM, Dan Burkert <[email protected]> wrote: > Yep: https://issues.apache.org/jira/browse/KUDU-1903 > > - Dan > > On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon <[email protected]> wrote: > >> Hey Dan, >> >> Mind filing a critical or blocker JIRA against 1.3 so we can track >> remaining things that should go into the branch before release? >> >> -Todd >> >> On Tue, Feb 28, 2017 at 10:05 AM, Dan Burkert <[email protected]> >> wrote: >> >>> Hey Paul, >>> >>> Thanks for checking that out and following up. I'm going to try and >>> root cause this today so that we have plenty of time to get a fix in to 1.3 >>> if it requires one. Thanks again for the report. In the meantime, let me >>> know if the alter table workaround is not enough for you to make progress >>> with Kudu. >>> >>> -Dan >>> >>> >>> On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan < >>> [email protected]> wrote: >>> >>> One side-effect of neglecting to drop the unbounded range partition: I >>> get a stack trace when I try to scan: >>> >>> F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it != >>> collection.end() Map key not found: ▒3 >>> *** Check failure stack trace: *** >>> @ 0x7fca2a5506ad (unknown) >>> @ 0x7fca2a55271c (unknown) >>> @ 0x7fca2a550209 (unknown) >>> @ 0x7fca2a5530af (unknown) >>> @ 0x7fca2a3de482 (unknown) >>> @ 0x7fca2a3dae70 (unknown) >>> @ 0x7fca2a3dc100 (unknown) >>> @ 0x7fca2a429a44 (unknown) >>> @ 0x7fca2a42ab47 (unknown) >>> @ 0x7fca2a42e94c (unknown) >>> @ 0x7fca2a43081c (unknown) >>> @ 0x7fca2a5a9a56 (unknown) >>> @ 0x7fca2a5aa948 (unknown) >>> @ 0x7fca2a41ac8b (unknown) >>> @ 0x7fca2a4dcfc8 (unknown) >>> @ 0x7fca290d6182 start_thread >>> @ 0x7fca2980947d clone >>> @ (nil) (unknown) >>> >>> >>> On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan < >>> [email protected]> wrote: >>> >>> Is that 4TB per tablet server, regardless of how many tablets it has? >>> >>> If I have 128GB of data per day, then each tablet server hits the >>> recommended limit after about a month. To store 10 years of data, I would >>> need 120 tablet servers to avoid going over the limit. Is that the best >>> solution or is there another alternative? >>> >>> How many cores are recommended per tablet server? If I typically only >>> scan one day of data at time, could a single core service multiple tablet >>> servers? >>> >>> >>> On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan < >>> [email protected]> wrote: >>> >>> The test doesn't exactly reproduce what I did in my sample program. >>> >>> I'm able to successfully drop the unbounded partition in both cases >>> (calling set_range_partition_columns only vs calling >>> set_range_partition_columns+add_hash_partitions). However, if I omit >>> the call to DropRangePartition, then AddRangePartition succeeds in the >>> first case and fails in the second case. I expect it to succeed in both >>> cases or fail in both cases. >>> >>> I've attached a simple program which demonstrates. >>> >>> >>> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <[email protected]> >>> wrote: >>> >>> Hi Paul, >>> >>> I can't reproduce the behavior you are describing, I always get a single >>> unbounded range partition when creating the table without specifying range >>> bounds or splits (regardless of hash partitioning). I searched and couldn't >>> find a unit test for this behavior, so I wrote one - you might compare your >>> code against my test. https://gerrit.cloudera.org/#/c/6153/ >>> >>> Thanks, >>> Dan >>> >>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan < >>> [email protected]> wrote: >>> >>> I can verify that dropping the unbounded range partition allows me to >>> later add bounded partitions. >>> >>> If I only have range partitioning (by commenting out the call to >>> add_hash_partitions), adding a bounded partition succeeds, regardless of >>> whether I first drop the unbounded partition. This seems surprising; why >>> the difference? >>> >>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <[email protected]> >>> wrote: >>> >>> Hi Paul, >>> >>> I think the issue you are running into is that if you don't add a range >>> partition explicitly during table creation (by calling add_range_partition >>> or inserting a split with add_range_partition_split), Kudu will default to >>> creating 1 unbounded range partition. So your two options are to add the >>> range partition during table creation time, or if you only know that >>> partition you want at a later time, you can drop the existing partition >>> (alterer->DropRangePartition with two empty rows), then add the range >>> partition. Note that dropping the range partition will effectively >>> truncate the table. This can be done with the same alterer in a single >>> transaction. If you want to see a bunch of examples, you can check out >>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/in >>> tegration-tests/alter_table-test.cc#L1106. >>> >>> - Dan >>> >>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan < >>> [email protected]> wrote: >>> >>> I'm trying to create a table with one-column range-partitioned and >>> another column hash-partitioned. Documentation for add_hash_partitions and >>> set_range_partition_columns suggest this should be possible ("Tables must >>> be created with either range, hash, or range and hash partitioning"). >>> >>> I have a schema with three INT64 columns ("time", "key", and "value"). >>> When I create the table, I set up the partitioning: >>> >>> (*table_creator) >>> .table_name("test_table") >>> .schema(&schema) >>> .add_hash_partitions({"key"}, 2) >>> .set_range_partition_columns({"time"}) >>> .num_replicas(1) >>> .Create() >>> >>> I later try to add a partition: >>> >>> auto timesplit(KuduSchema & schema, std::int64_t t) { >>> auto split = schema.NewRow(); >>> check_ok(split->SetInt64("time", t)); >>> return split; >>> } >>> >>> alterer->AddRangePartition( >>> timesplit(schema, date_start), >>> timesplit(schema, next_date_start)); >>> >>> check_ok(alterer->Alter()); >>> >>> But I get an error "Invalid argument: New range partition conflicts with >>> existing range partition". >>> >>> How are hash and range partitioning intended to be mixed? >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > >
