Re: How to decrease kudu server restart time

Gary Gao Wed, 15 Aug 2018 03:22:06 -0700

The output of command [kudu local_replica data_size] are shown below, but
it seems that the **Total live blocks** are the total block number of the
table, not specific tablet:


Total live blocks: 22515001
Total live bytes: 1362248371390
Total live bytes (after alignment): 1446784176128
Total number of LBM containers: 22403 (17366 full)
.....
.....


table schema:

create table venus.ods_xk_pay_fee_order(
time_day bigint,
CREATETIME BIGINT,
BUYERID BIGINT,
SELLERID BIGINT,
ORDERID String,
BIZID BIGINT,
ID BIGINT,
SELLERFAMILYID BIGINT,
PRODUCTID BIGINT,
PRODUCTTYPE BIGINT,
PRICE BIGINT,
REALPRICE BIGINT,
DISCOUNT BIGINT,
SHARERATE BIGINT,
DEVICETYPE BIGINT,
DEVICEID String,
APPID BIGINT,
PKNAME String,
APPVERSION String,
CREATEIP BIGINT,
SERIALID String,
SCID String,
COMPLETESTATUS BIGINT,
COMPLETETIME BIGINT,
TRYCOUNT BIGINT,
APPCHANNEL String,
SDKID BIGINT,
LIVESTATUS BIGINT,
PAYSTATUS BIGINT,
THRIDORDERID String,
LIVESOURCE BIGINT,
LIVEPRODUCTTYPE BIGINT,
PAYMODE BIGINT,
SUBPRODUCTTYPE BIGINT,
SALETYPE BIGINT,
primary key(time_day, createtime, buyerid, sellerid, orderid, bizid, id))
partition by hash (time_day, createtime, buyerid, sellerid, orderid, bizid,
id) partitions 3,
range(time_day)(PARTITION 1483200000 <= values < 1514736000, ...) stored as
kudu



There are only 3.3 millions records[in 3 tablets] in this table, and less
50 thousands records are ingested in this table every day, with many
updates.


I deep dived into kudu flags configuration and found the following flags
related to **BLOCK_SIZE**, what is the recommended value of these flags:

--cfile_default_block_size=262144

--deltafile_default_block_size=32768

-default_composite_key_index_block_size_bytes=4096

--tablet_bloom_block_size=4096



On Tue, Aug 14, 2018 at 5:41 AM Adar Lieber-Dembo <a...@cloudera.com> wrote:

> > Even if the kudu server started, it also spent too much copying tablet,
> as the following tablet block copying log:
> >
> >
> > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
> 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not
> RUNNING
> >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
> >     State:       INITIALIZED
> >     Data state:  TABLET_DATA_COPYING
> >     Last status: Tablet Copy: Downloading block 0000000084111077
> (299837/1177225)
> >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
> >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING
> [LEADER]
>
> I see that this tablet has over a million blocks, but how are you
> measuring that it's spending too much time copying? How much time did
> it take to fully copy this tablet?
>
> > 1. It seems kudu server spent a long time to open log block container,
> how to speed up restarting kudu server ?
>
> Your Kudu server log should contain some log messages that'll help us
> understand what's going on. Look for a message like "Time spent
> opening block manager" and paste that.  Also can you find and paste
> the "FS layout report"?
>
> In general, the more blocks (and thus block containers) you have, the
> longer it'll take Kudu to restart. KUDU-2014 has some ideas on how we
> might improve this.
>
> Once a tserver is deemed dead and its data is rereplicated elsewhere,
> you can just reformat the node (i.e. delete the contents of the WAL,
> metadata, and data directories). Its contents are no longer necessary,
> and this will reset the number of log block containers to 0, which
> will speed up subsequent restarts.
>
> > 2. I think the number of blocks have an influence on kudu server
> restarting time and query time on specific tablet, more number of blocks,
> more restarting time and query time. Is this right ?
>
> Yes to restarting time, but not necessarily to query time. It really
> depends on the kinds of queries you're issuing, how many predicates
> they have, etc.
>
> > 3. Why there are more than 1 million blocks in a tablet, as shown in
> above Tablet Copy log, while there are less than 500 thousands of records
> in the tablet ?
>
> That's an excellent question. What kind of write workload do you have?
> What's your table schema and partitioning? Do you have any
> non-standard flags defined that may affect how Kudu flushes or
> compacts its data?
>
> I'd also suggest running the CLI tool 'kudu local_replica data_size'
> on that large replica you described above. It will help identify
> whether this is a case of very large tablets, or just high numbers of
> blocks.
>
> > 4. How to reduce the number of block in tablet ?
>
> Once you answer the questions I posed just above, I might be able to
> offer some recommendations for how to reduce the overall number of
> blocks.
>

Re: How to decrease kudu server restart time

Reply via email to