Re: mixing range and hash partitioning

Paul Brannan Mon, 27 Feb 2017 15:04:08 -0800

One side-effect of neglecting to drop the unbounded range partition: I get
a stack trace when I try to scan:


F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
collection.end() Map key not found: ▒3
*** Check failure stack trace: ***
    @     0x7fca2a5506ad  (unknown)
    @     0x7fca2a55271c  (unknown)
    @     0x7fca2a550209  (unknown)
    @     0x7fca2a5530af  (unknown)
    @     0x7fca2a3de482  (unknown)
    @     0x7fca2a3dae70  (unknown)
    @     0x7fca2a3dc100  (unknown)
    @     0x7fca2a429a44  (unknown)
    @     0x7fca2a42ab47  (unknown)
    @     0x7fca2a42e94c  (unknown)
    @     0x7fca2a43081c  (unknown)
    @     0x7fca2a5a9a56  (unknown)
    @     0x7fca2a5aa948  (unknown)
    @     0x7fca2a41ac8b  (unknown)
    @     0x7fca2a4dcfc8  (unknown)
    @     0x7fca290d6182  start_thread
    @     0x7fca2980947d  clone
    @              (nil)  (unknown)


On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <[email protected]>
wrote:

> Is that 4TB per tablet server, regardless of how many tablets it has?
>
> If I have 128GB of data per day, then each tablet server hits the
> recommended limit after about a month.  To store 10 years of data, I would
> need 120 tablet servers to avoid going over the limit.  Is that the best
> solution or is there another alternative?
>
> How many cores are recommended per tablet server?  If I typically only
> scan one day of data at time, could a single core service multiple tablet
> servers?
>
>
> On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <
> [email protected]> wrote:
>
>> The test doesn't exactly reproduce what I did in my sample program.
>>
>> I'm able to successfully drop the unbounded partition in both cases
>> (calling set_range_partition_columns only vs calling
>> set_range_partition_columns+add_hash_partitions).  However, if I omit
>> the call to DropRangePartition, then AddRangePartition succeeds in the
>> first case and fails in the second case.  I expect it to succeed in both
>> cases or fail in both cases.
>>
>> I've attached a simple program which demonstrates.
>>
>>
>> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <[email protected]>
>> wrote:
>>
>>> Hi Paul,
>>>
>>> I can't reproduce the behavior you are describing, I always get a single
>>> unbounded range partition when creating the table without specifying range
>>> bounds or splits (regardless of hash partitioning). I searched and couldn't
>>> find a unit test for this behavior, so I wrote one - you might compare your
>>> code against my test. https://gerrit.cloudera.org/#/c/6153/
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <
>>> [email protected]> wrote:
>>>
>>>> I can verify that dropping the unbounded range partition allows me to
>>>> later add bounded partitions.
>>>>
>>>> If I only have range partitioning (by commenting out the call to
>>>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>>>> whether I first drop the unbounded partition.  This seems surprising; why
>>>> the difference?
>>>>
>>>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Paul,
>>>>>
>>>>> I think the issue you are running into is that if you don't add a
>>>>> range partition explicitly during table creation (by calling
>>>>> add_range_partition or inserting a split with add_range_partition_split),
>>>>> Kudu will default to creating 1 unbounded range partition.  So your two
>>>>> options are to add the range partition during table creation time, or if
>>>>> you only know that partition you want at a later time, you can drop the
>>>>> existing partition (alterer->DropRangePartition with two empty rows), then
>>>>> add the range partition.  Note that dropping the range partition will
>>>>> effectively truncate the table.  This can be done with the same alterer in
>>>>> a single transaction.  If you want to see a bunch of examples, you can
>>>>> check out this unit test: https://github.com/apach
>>>>> e/kudu/blob/master/src/kudu/integration-tests/alter_table-te
>>>>> st.cc#L1106.
>>>>>
>>>>> - Dan
>>>>>
>>>>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I'm trying to create a table with one-column range-partitioned and
>>>>>> another column hash-partitioned.  Documentation for add_hash_partitions 
>>>>>> and
>>>>>> set_range_partition_columns suggest this should be possible ("Tables must
>>>>>> be created with either range, hash, or range and hash partitioning").
>>>>>>
>>>>>> I have a schema with three INT64 columns ("time", "key", and
>>>>>> "value").  When I create the table, I set up the partitioning:
>>>>>>
>>>>>> (*table_creator)
>>>>>>   .table_name("test_table")
>>>>>>   .schema(&schema)
>>>>>>   .add_hash_partitions({"key"}, 2)
>>>>>>   .set_range_partition_columns({"time"})
>>>>>>   .num_replicas(1)
>>>>>>   .Create()
>>>>>>
>>>>>> I later try to add a partition:
>>>>>>
>>>>>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>>>>>   auto split = schema.NewRow();
>>>>>>   check_ok(split->SetInt64("time", t));
>>>>>>   return split;
>>>>>> }
>>>>>>
>>>>>> alterer->AddRangePartition(
>>>>>>   timesplit(schema, date_start),
>>>>>>   timesplit(schema, next_date_start));
>>>>>>
>>>>>> check_ok(alterer->Alter());
>>>>>>
>>>>>> But I get an error "Invalid argument: New range partition conflicts
>>>>>> with existing range partition".
>>>>>>
>>>>>> How are hash and range partitioning intended to be mixed?
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: mixing range and hash partitioning

Reply via email to