I'm using Kudu 1.6.0, does this version have the feature you mentioned : The recent versions are using 3-4-3 replica replacement, meaning the tablet copy should be automatically canceled when the third replica comes online and the copy hasn't finished yet.
On Mon, Aug 13, 2018 at 5:16 PM Attila Bukor <[email protected]> wrote: > Hi Gary, > > Please find my answers inline. > > On Sun, Aug 12, 2018 at 01:17:03PM +0800, Gary Gao wrote: > > I have a kudu cluster of 40 nodes, when I realized that > > maintenance_manager_num_threads=1 is too small, I updated config file and > > restarted a kudu tablet server, but it took too long to start, longer > than > > --follower_unavailable_considered_failed_sec=600, causing tablet > > redistribution. > > Even if the kudu server started, it also spent too much copying tablet, > as > > the following tablet block copying log: > > > > > > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table > > 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) > not > > RUNNING > > 41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state > > State: INITIALIZED > > Data state: TABLET_DATA_COPYING > > Last status: Tablet Copy: Downloading block 0000000084111077 > > (299837/1177225) > > 52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING > > b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING > > [LEADER] > > > > Which version are you using? The recent versions are using 3-4-3 replica > replacement, meaning the tablet copy should be automatically canceled > when the third replica comes online and the copy hasn't finished yet. > > > > > My Question are: > > > > 1. It seems kudu server spent a long time to open log block container, > how > > to speed up restarting kudu server ? > > The startup time of the tablet servers mostly depends on the number of > tablets hosted on the server. I'm not sure if there's any way to tune > it, aside from reducing the number of tablets. How many tablets do you > have per tablet server? > > > > > 2. I think the number of blocks have an influence on kudu server > restarting > > time and query time on specific tablet, more number of blocks, more > > restarting time and query time. Is this right ? > > I'm not sure how much the number of blocks influences the restart time, > maybe someone else can shed some light on this one. I'd focus on the > number of tablets though. > > The query latencies depend on how many blocks the server needs to read > from, but it's a matter of how well the data is compacted (either by > sequential writes instead of random writes, or whether the maintenance > managers compacted them), rather than the number of total blocks. > > > > > 3. Why there are more than 1 million blocks in a tablet, as shown in > above > > Tablet Copy log, while there are less than 500 thousands of records in > the > > tablet ? > > > > Each rowset will have multiple blocks (one per column, UNDO and > REDO deltas, and bloom filters). The number of rowsets depends on the > number of rows. > > > 4. How to reduce the number of block in tablet ? > > The maintenance managers perform compactions that reduce the number of > blocks per tablets. Other than this, less columns or less rows also > results in less blocks of course. > > - Attila >
