Re: mixing range and hash partitioning

Paul Brannan Sun, 26 Feb 2017 15:54:07 -0800

Is that 4TB per tablet server, regardless of how many tablets it has?

If I have 128GB of data per day, then each tablet server hits the
recommended limit after about a month.  To store 10 years of data, I would
need 120 tablet servers to avoid going over the limit.  Is that the best
solution or is there another alternative?


How many cores are recommended per tablet server?  If I typically only scan
one day of data at time, could a single core service multiple tablet
servers?


On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <[email protected]>
wrote:

> The test doesn't exactly reproduce what I did in my sample program.
>
> I'm able to successfully drop the unbounded partition in both cases
> (calling set_range_partition_columns only vs calling
> set_range_partition_columns+add_hash_partitions).  However, if I omit the
> call to DropRangePartition, then AddRangePartition succeeds in the first
> case and fails in the second case.  I expect it to succeed in both cases or
> fail in both cases.
>
> I've attached a simple program which demonstrates.
>
>
> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <[email protected]>
> wrote:
>
>> Hi Paul,
>>
>> I can't reproduce the behavior you are describing, I always get a single
>> unbounded range partition when creating the table without specifying range
>> bounds or splits (regardless of hash partitioning). I searched and couldn't
>> find a unit test for this behavior, so I wrote one - you might compare your
>> code against my test. https://gerrit.cloudera.org/#/c/6153/
>>
>> Thanks,
>> Dan
>>
>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <
>> [email protected]> wrote:
>>
>>> I can verify that dropping the unbounded range partition allows me to
>>> later add bounded partitions.
>>>
>>> If I only have range partitioning (by commenting out the call to
>>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>>> whether I first drop the unbounded partition.  This seems surprising; why
>>> the difference?
>>>
>>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <[email protected]>
>>> wrote:
>>>
>>>> Hi Paul,
>>>>
>>>> I think the issue you are running into is that if you don't add a range
>>>> partition explicitly during table creation (by calling add_range_partition
>>>> or inserting a split with add_range_partition_split), Kudu will default to
>>>> creating 1 unbounded range partition.  So your two options are to add the
>>>> range partition during table creation time, or if you only know that
>>>> partition you want at a later time, you can drop the existing partition
>>>> (alterer->DropRangePartition with two empty rows), then add the range
>>>> partition.  Note that dropping the range partition will effectively
>>>> truncate the table.  This can be done with the same alterer in a single
>>>> transaction.  If you want to see a bunch of examples, you can check out
>>>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/in
>>>> tegration-tests/alter_table-test.cc#L1106.
>>>>
>>>> - Dan
>>>>
>>>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>>>> [email protected]> wrote:
>>>>
>>>>> I'm trying to create a table with one-column range-partitioned and
>>>>> another column hash-partitioned.  Documentation for add_hash_partitions 
>>>>> and
>>>>> set_range_partition_columns suggest this should be possible ("Tables must
>>>>> be created with either range, hash, or range and hash partitioning").
>>>>>
>>>>> I have a schema with three INT64 columns ("time", "key", and
>>>>> "value").  When I create the table, I set up the partitioning:
>>>>>
>>>>> (*table_creator)
>>>>>   .table_name("test_table")
>>>>>   .schema(&schema)
>>>>>   .add_hash_partitions({"key"}, 2)
>>>>>   .set_range_partition_columns({"time"})
>>>>>   .num_replicas(1)
>>>>>   .Create()
>>>>>
>>>>> I later try to add a partition:
>>>>>
>>>>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>>>>   auto split = schema.NewRow();
>>>>>   check_ok(split->SetInt64("time", t));
>>>>>   return split;
>>>>> }
>>>>>
>>>>> alterer->AddRangePartition(
>>>>>   timesplit(schema, date_start),
>>>>>   timesplit(schema, next_date_start));
>>>>>
>>>>> check_ok(alterer->Alter());
>>>>>
>>>>> But I get an error "Invalid argument: New range partition conflicts
>>>>> with existing range partition".
>>>>>
>>>>> How are hash and range partitioning intended to be mixed?
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: mixing range and hash partitioning

Reply via email to