[jira] [Resolved] (KUDU-26) Handle corrupt Tablets at startup

2019-11-19 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-26.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

This was fixed a while ago in a series of commits.

> Handle corrupt Tablets at startup
> -
>
> Key: KUDU-26
> URL: https://issues.apache.org/jira/browse/KUDU-26
> Project: Kudu
>  Issue Type: Improvement
>  Components: tablet, tserver
>Affects Versions: M3
>Reporter: Todd Lipcon
>Priority: Major
> Fix For: 1.5.0
>
>
> Currently if any tablet fails to load at startup, the whole tserver fails to 
> start. Instead, it should mark those tablets as being hosted by the server, 
> but in a CORRUPT state. This will help admins find the issue and hopefully 
> debug/recover, instead of causing cluster-wide downtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2971) Add a generic Java library wrapper

2019-11-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977938#comment-16977938
 ] 

ASF subversion and git services commented on KUDU-2971:
---

Commit 68f9fbc420a7ade895ffa639971978713fbbee4f in kudu's branch 
refs/heads/master from Hao Hao
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=68f9fbc ]

KUDU-2971 p1: add subprocess module

Utility classes exist that allow for IPC over stdin/stdout via protobuf
and JSON-encoded protobuf. This commit moves those classes into their
own directory so it can be reused by other subprocesses.

Following commits can then extend it to support concurrent communications
with subprocess. There are no functional changes in this patch.

Change-Id: If73e27772e1897a04f04229c4906a24c61e361f2
Reviewed-on: http://gerrit.cloudera.org:8080/14425
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong 


> Add a generic Java library wrapper
> --
>
> Key: KUDU-2971
> URL: https://issues.apache.org/jira/browse/KUDU-2971
> Project: Kudu
>  Issue Type: Sub-task
>Affects Versions: 1.11.0
>Reporter: Hao Hao
>Assignee: Hao Hao
>Priority: Major
>
> For Ranger integration, to call Java Ranger plugin from masters, we need a 
> create a wrapper (via Java subprocess). This should be generic to be used by 
> future integrations (e.g. Atlas) which need to call other Java libraries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3003) TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites is flaky

2019-11-19 Thread Hao Hao (Jira)
Hao Hao created KUDU-3003:
-

 Summary: 
TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites is flaky
 Key: KUDU-3003
 URL: https://issues.apache.org/jira/browse/KUDU-3003
 Project: Kudu
  Issue Type: Bug
Reporter: Hao Hao
 Attachments: test-output.txt

testTabletCacheInvalidatedDuringWrites of the 
org.apache.kudu.client.TestAsyncKuduSession test sometimes fails with an error 
like below. I attached full test log.
{noformat}
There was 1 failure:
1) 
testTabletCacheInvalidatedDuringWrites(org.apache.kudu.client.TestAsyncKuduSession)
org.apache.kudu.client.PleaseThrottleException: all buffers are currently 
flushing
at 
org.apache.kudu.client.AsyncKuduSession.apply(AsyncKuduSession.java:579)
at 
org.apache.kudu.client.TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites(TestAsyncKuduSession.java:371)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3002) consider compactions as a mechanism to flush many DMSs

2019-11-19 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3002:
--
Description: 
When under memory pressure, we'll aggressively perform the maintenance 
operation that frees the most memory. Right now, the only ops that register 
memory are MRS and DMS flushes.

In practice, this means a couple things:
 * In most cases, we'll prioritize flushing MRSs way ahead of flushing DMS, 
since updates are spread across many DMSs and will therefore tend to be small, 
whereas any non-trivial insert workload will well up into a single MRS for an 
entire tablet
 * We'll only flush a single DMS at a time to free memory. Because of this, and 
because we'll likely prioritize MRS flushes over DMS flushes, we may end up 
with a ton of tiny DMSs in a tablet that we'll never flush. This can end up 
bloating the WALs because each DMS may be anchoring some WAL segments.

A couple thoughts on small things we can do to improve this:
 * Register the DMS size as ram anchored by a compaction. This will meant that 
we can schedule compactions to flush DMSs en masse. This would still mean that 
we could end up always prioritizing MRS flushes, depending on how quickly we're 
inserting.
 * We currently register the amount disk space an LogGC would free up. We could 
do something similar, but register how many log anchors an op could release. 
This would be a bit trickier, since the log anchors aren't solely determined by 
the mem-stores (e.g. we'll anchor segments to catch up slow followers).
 * Introduce a new op (or change the flush DMS op) that would flush as many 
DMSs as we can for a given tablet.

Between these, the first seems like it'd be an easy win.

  was:
When under memory pressure, we'll aggressively perform the maintenance 
operation that frees the most memory. Right now, the only ops that register 
memory are MRS and DMS flushes.

In practice, this means a couple things:
 * In most cases, we'll prioritize flushing MRSs way ahead of flushing DMS, 
since updates are spread across many DMSs and will therefore tend to be small, 
whereas any non-trivial insert workload will well up into a single MRS for an 
entire tablet
 * We'll only flush a single DMS at a time to free memory. Because of this, and 
because we'll likely prioritize MRS flushes over DMS flushes, we may end up 
with a ton of tiny DMSs in a tablet that we'll never flush. This can end up 
bloating the WALs because each DMS may be anchoring some WAL segments.

A couple thoughts on small things we can do to improve this:
 * Register the DMS size as ram anchored by a compaction. This will meant that 
we can schedule compactions to flush DMSs en masse. This would still mean that 
we could end up always prioritizing MRS flushes, depending on how quickly we're 
inserting.
 * We currently register the amount disk space an LogGC would free up. We could 
do something similar, but register how many log anchors an op could release. 
This would be a bit trickier, since the log anchors aren't solely determined by 
the mem-stores (e.g. we'll anchor segments to catch up slow followers).

Between the two, the first seems like it'd be an easy win.


> consider compactions as a mechanism to flush many DMSs
> --
>
> Key: KUDU-3002
> URL: https://issues.apache.org/jira/browse/KUDU-3002
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, tablet
>Reporter: Andrew Wong
>Priority: Major
>
> When under memory pressure, we'll aggressively perform the maintenance 
> operation that frees the most memory. Right now, the only ops that register 
> memory are MRS and DMS flushes.
> In practice, this means a couple things:
>  * In most cases, we'll prioritize flushing MRSs way ahead of flushing DMS, 
> since updates are spread across many DMSs and will therefore tend to be 
> small, whereas any non-trivial insert workload will well up into a single MRS 
> for an entire tablet
>  * We'll only flush a single DMS at a time to free memory. Because of this, 
> and because we'll likely prioritize MRS flushes over DMS flushes, we may end 
> up with a ton of tiny DMSs in a tablet that we'll never flush. This can end 
> up bloating the WALs because each DMS may be anchoring some WAL segments.
> A couple thoughts on small things we can do to improve this:
>  * Register the DMS size as ram anchored by a compaction. This will meant 
> that we can schedule compactions to flush DMSs en masse. This would still 
> mean that we could end up always prioritizing MRS flushes, depending on how 
> quickly we're inserting.
>  * We currently register the amount disk space an LogGC would free up. We 
> could do something similar, but register how many log anchors an op could 
> release. This would be a bit trickier, since the log anchors aren't solely 

[jira] [Created] (KUDU-3002) consider compactions as a mechanism to flush many DMSs

2019-11-19 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3002:
-

 Summary: consider compactions as a mechanism to flush many DMSs
 Key: KUDU-3002
 URL: https://issues.apache.org/jira/browse/KUDU-3002
 Project: Kudu
  Issue Type: Improvement
  Components: perf, tablet
Reporter: Andrew Wong


When under memory pressure, we'll aggressively perform the maintenance 
operation that frees the most memory. Right now, the only ops that register 
memory are MRS and DMS flushes.

In practice, this means a couple things:
 * In most cases, we'll prioritize flushing MRSs way ahead of flushing DMS, 
since updates are spread across many DMSs and will therefore tend to be small, 
whereas any non-trivial insert workload will well up into a single MRS for an 
entire tablet
 * We'll only flush a single DMS at a time to free memory. Because of this, and 
because we'll likely prioritize MRS flushes over DMS flushes, we may end up 
with a ton of tiny DMSs in a tablet that we'll never flush. This can end up 
bloating the WALs because each DMS may be anchoring some WAL segments.

A couple thoughts on small things we can do to improve this:
 * Register the DMS size as ram anchored by a compaction. This will meant that 
we can schedule compactions to flush DMSs en masse. This would still mean that 
we could end up always prioritizing MRS flushes, depending on how quickly we're 
inserting.
 * We currently register the amount disk space an LogGC would free up. We could 
do something similar, but register how many log anchors an op could release. 
This would be a bit trickier, since the log anchors aren't solely determined by 
the mem-stores (e.g. we'll anchor segments to catch up slow followers).

Between the two, the first seems like it'd be an easy win.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2929) Don't starve compactions under memory pressure

2019-11-19 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977885#comment-16977885
 ] 

Andrew Wong commented on KUDU-2929:
---

Also [~ZhangYao] good point about registering the DMS size as anchored memory. 
That seems to be an important point about update-heavy workloads. I've filed 
KUDU-3002 for it.

> Don't starve compactions under memory pressure
> --
>
> Key: KUDU-2929
> URL: https://issues.apache.org/jira/browse/KUDU-2929
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, tablet
>Reporter: Andrew Wong
>Assignee: Andrew Wong
>Priority: Major
> Fix For: 1.12.0
>
>
> When a server is under memory pressure, the maintenance manager exclusively 
> will look for the maintenance op that frees up the most memory. Some 
> operations, like compactions, do not register any amount of "anchored memory" 
> and effectively don't qualify for consideration.
> This means that when a tablet server is under memory pressure, compactions 
> will never be scheduled, even though compacting may actually end up reducing 
> memory (e.g. combining many rowsets-worth of CFileReaders into a single 
> rowset). While it makes sense to prefer flushes to compactions, it probably 
> doesn't make sense to do nothing vs compact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-38) bootstrap should not replay logs that are known to be fully flushed

2019-11-19 Thread Todd Lipcon (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1695#comment-1695
 ] 

Todd Lipcon commented on KUDU-38:
-

bq. Guaranteeing that every complete segment has a fully sync'ed index file 
makes for a nice invariant, but isn't it overkill for the task at hand? 
Couldn't we get away with sync'ing whichever index file contains the earliest 
anchored index at TabletMetadata flush time? I'm particularly concerned about 
the backwards compatibility implications: how do we establish this invariant 
after upgrading to a release including this fix? Or, how do we detect that it's 
not present in existing log index files?

I think we need to make sure that all prior indexes are also synced, because 
it's possible that there is a lagging peer that will still need to catch up 
from a very old record. The index file is what allows a leader to go find those 
old log entries and send them along. Without it, the old log segments aren't 
useful.

bq. I'm particularly concerned about the backwards compatibility implications: 
how do we establish this invariant after upgrading to a release including this 
fix? Or, how do we detect that it's not present in existing log index files?

Yep, we'd need to take that into account, eg by adding some new flag to the 
tablet metadata indicating that the indexes are durable or somesuch.

bq. Alternatively, what about forgoing the log index file and rather than 
storing the earliest anchored index in the TabletMetadata, storing the 
"physical index" (i.e. the LogIndexEntry corresponding to the anchor)?

Again per above, we can't be alive with an invalid index file, or else 
consensus won't be happy.


Ignoring the rest of your questions for a minute, let me throw out an 
alternative idea or two:

*Option 1:*

We could add a new separate piece of metadata next to the logs called a "sync 
point" or somesuch. (this could even be at a well-known offset in the existing 
log file or something). We can periodically wake up a background process in a 
log (eg when we see that the sync point is too far back) and then:
(1) look up the earliest durability-anchored offset
(2) msync the log indexes up to that point
(3) write that point to the special "sync point" metadata file. This is just an 
offset, so it can be written atomically and lazily flushed (it only moves 
forward)

At startup, if we see a sync point metadata file, we know we can start 
replaying (and reconstructing index) from that point, without having to 
reconstruct any earlier index entries. If we do this lazily (eg once every few 
seconds only on actively-written tablets) the performance overhead should be 
negligible.

We also need to think about how this interacts with tablet copy -- right now, a 
newly copied tablet relies on replaying the WALs from the beginning because it 
doesn't copy log indexes. We may need to change that.

*Option 2*: get rid of "log index"
This is the "nuke everything from orbit" option: the whole log index thing was 
convenient but it's somewhat annoying for a number of reasons: (1) the issues 
described here, (2) we are using mmapped IO which is dangerous since IO errors 
crash the process, (3) just another bit of code to worry about and transfer 
around in tablet copy, etc.

The alternative is to embed the index in the WAL itself. One sketch of an 
implementation would be something like:
- divide the WAL into fixed-size pages, each with a header.  The header would 
have term/index info and some kind of "continuation" flag for when entries span 
multiple pages. This is more or less the postgres WAL design
- this allows us to binary-search the WAL instead of having a separate index.
- we have to consider how truncations work -- I guess we would move to physical 
truncation.

Another possible idea would be to not use fixed-size pages, but instead embed a 
tree structure into the WAL itself. For example, it wouldn't be too tough to 
add a back-pointer from each entry to the previous entry to enable backward 
scanning. If we then take a skip-list like approach (n/2 nodes have a skip-1 
pointer, n/4 nodes have a skip-4 pointer, n/8 nodes have a skip-8 pointer, etc) 
then we can get logarithmic access time to past log entries. Again, need to 
consider truncation.

Either of these options have the advantage that we no longer need to worry 
about indexes, but we still do need to worry about figuring out where to start 
replaying from, and we could take the same strategy as the first suggestion for 
that.

> bootstrap should not replay logs that are known to be fully flushed
> ---
>
> Key: KUDU-38
> URL: https://issues.apache.org/jira/browse/KUDU-38
> Project: Kudu
>  Issue Type: Sub-task
>  Components: tablet
>Affects Versions: M3
>Reporter: Todd Lipcon
>  

[jira] [Created] (KUDU-3001) Multi-thread to load containers in a data directory

2019-11-19 Thread Yingchun Lai (Jira)
Yingchun Lai created KUDU-3001:
--

 Summary: Multi-thread to load containers in a data directory
 Key: KUDU-3001
 URL: https://issues.apache.org/jira/browse/KUDU-3001
 Project: Kudu
  Issue Type: Improvement
Reporter: Yingchun Lai
Assignee: Yingchun Lai


As what [~tlipcon] mentioned in 
https://issues.apache.org/jira/browse/KUDU-2014, we can improve tserver startup 
time by load containers in a data directoty by multiple threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-2453) kudu should stop creating tablet infinitely

2019-11-19 Thread Yingchun Lai (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977296#comment-16977296
 ] 

Yingchun Lai edited comment on KUDU-2453 at 11/19/19 9:25 AM:
--

We also happend to see this issue, I created another Jira to trace it, and also 
gave some ideas to resolve it.


was (Author: acelyc111):
We also happend to see this issue, I created another Jira to trace it, and also 
give some ideas to resolve it.

> kudu should stop creating tablet infinitely
> ---
>
> Key: KUDU-2453
> URL: https://issues.apache.org/jira/browse/KUDU-2453
> Project: Kudu
>  Issue Type: Bug
>  Components: master, tserver
>Affects Versions: 1.4.0, 1.7.2
>Reporter: LiFu He
>Priority: Major
>
> I have met this problem again on 2018/10/26. And now the kudu version is 
> 1.7.2.
> -
> We modified the flag 'max_create_tablets_per_ts' (2000) of master.conf, and 
> there are some load on the kudu cluster. Then someone else created a big 
> table which had tens of thousands of tablets from impala-shell (that was a 
> mistake). 
> {code:java}
> CREATE TABLE XXX(
> ...
>PRIMARY KEY (...)
> )
> PARTITION BY HASH (...) PARTITIONS 100,
> RANGE (...)
> (
>   PARTITION "2018-10-24" <= VALUES < "2018-10-24\000",
>   PARTITION "2018-10-25" <= VALUES < "2018-10-25\000",
>   ...
>   PARTITION "2018-12-07" <= VALUES < "2018-12-07\000"
> )
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'= '...');
> {code}
> Here are the logs after creating table (only pick one tablet as example):
> {code:java}
> --Kudu-master log
> ==e884bda6bbd3482f94c07ca0f34f99a4==
> W1024 11:40:51.914397 180146 catalog_manager.cc:2664] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): Create Tablet RPC 
> failed for tablet e884bda6bbd3482f94c07ca0f34f99a4: Remote error: Service 
> unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService 
> from 10.120.219.118:50247 dropped due to backpressure. The service queue is 
> full; it has 512 items.
> I1024 11:40:51.914412 180146 catalog_manager.cc:2700] Scheduling retry of 
> CreateTablet RPC for tablet e884bda6bbd3482f94c07ca0f34f99a4 on TS 
> 39f15fcf42ef45bba0c95a3223dc25ee with a delay of 42 ms (attempt = 1)
> ...
> ==Be replaced by 0b144c00f35d48cca4d4981698faef72==
> W1024 11:41:22.114512 180202 catalog_manager.cc:3949] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Tablet 
> e884bda6bbd3482f94c07ca0f34f99a4 (table quasi_realtime_user_feature 
> [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed 
> timeout. Replacing with a new tablet 0b144c00f35d48cca4d4981698faef72
> ...
> I1024 11:41:22.391916 180202 catalog_manager.cc:3806] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Sending 
> DeleteTablet for 3 replicas of tablet e884bda6bbd3482f94c07ca0f34f99a4
> ...
> I1024 11:41:22.391927 180202 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_DELETED) for tablet e884bda6bbd3482f94c07ca0f34f99a4 
> on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by 
> 0b144c00f35d48cca4d4981698faef72 at 2018-10-24 11:41:22 CST)
> ...
> W1024 11:41:22.428129 180146 catalog_manager.cc:2892] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for 
> tablet e884bda6bbd3482f94c07ca0f34f99a4 with error code TABLET_NOT_RUNNING: 
> Already present: State transition of tablet e884bda6bbd3482f94c07ca0f34f99a4 
> already in progress: creating tablet
> ...
> I1024 11:41:22.428143 180146 catalog_manager.cc:2700] Scheduling retry of 
> e884bda6bbd3482f94c07ca0f34f99a4 Delete Tablet RPC for 
> TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 35 ms (attempt = 1)
> ...
> W1024 11:41:22.683702 180145 catalog_manager.cc:2664] TS 
> b251540e606b4863bb576091ff961892 (kudu1.lt.163.org:7050): Create Tablet RPC 
> failed for tablet 0b144c00f35d48cca4d4981698faef72: Remote error: Service 
> unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService 
> from 10.120.219.118:59735 dropped due to backpressure. The service queue is 
> full; it has 512 items.
> I1024 11:41:22.683717 180145 catalog_manager.cc:2700] Scheduling retry of 
> CreateTablet RPC for tablet 0b144c00f35d48cca4d4981698faef72 on TS 
> b251540e606b4863bb576091ff961892 with a delay of 46 ms (attempt = 1)
> ...
> ==Be replaced by c0e0acc448fc42fc9e48f5025b112a75==
> W1024 11:41:52.775420 180202 catalog_manager.cc:3949] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Tablet 
> 0b144c00f35d48cca4d4981698faef72 (table quasi_realtime_user_feature 
> [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed 
> 

[jira] [Commented] (KUDU-2453) kudu should stop creating tablet infinitely

2019-11-19 Thread Yingchun Lai (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977296#comment-16977296
 ] 

Yingchun Lai commented on KUDU-2453:


We also happend to see this issue, I created another Jira to trace it, and also 
give some ideas to resolve it.

> kudu should stop creating tablet infinitely
> ---
>
> Key: KUDU-2453
> URL: https://issues.apache.org/jira/browse/KUDU-2453
> Project: Kudu
>  Issue Type: Bug
>  Components: master, tserver
>Affects Versions: 1.4.0, 1.7.2
>Reporter: LiFu He
>Priority: Major
>
> I have met this problem again on 2018/10/26. And now the kudu version is 
> 1.7.2.
> -
> We modified the flag 'max_create_tablets_per_ts' (2000) of master.conf, and 
> there are some load on the kudu cluster. Then someone else created a big 
> table which had tens of thousands of tablets from impala-shell (that was a 
> mistake). 
> {code:java}
> CREATE TABLE XXX(
> ...
>PRIMARY KEY (...)
> )
> PARTITION BY HASH (...) PARTITIONS 100,
> RANGE (...)
> (
>   PARTITION "2018-10-24" <= VALUES < "2018-10-24\000",
>   PARTITION "2018-10-25" <= VALUES < "2018-10-25\000",
>   ...
>   PARTITION "2018-12-07" <= VALUES < "2018-12-07\000"
> )
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'= '...');
> {code}
> Here are the logs after creating table (only pick one tablet as example):
> {code:java}
> --Kudu-master log
> ==e884bda6bbd3482f94c07ca0f34f99a4==
> W1024 11:40:51.914397 180146 catalog_manager.cc:2664] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): Create Tablet RPC 
> failed for tablet e884bda6bbd3482f94c07ca0f34f99a4: Remote error: Service 
> unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService 
> from 10.120.219.118:50247 dropped due to backpressure. The service queue is 
> full; it has 512 items.
> I1024 11:40:51.914412 180146 catalog_manager.cc:2700] Scheduling retry of 
> CreateTablet RPC for tablet e884bda6bbd3482f94c07ca0f34f99a4 on TS 
> 39f15fcf42ef45bba0c95a3223dc25ee with a delay of 42 ms (attempt = 1)
> ...
> ==Be replaced by 0b144c00f35d48cca4d4981698faef72==
> W1024 11:41:22.114512 180202 catalog_manager.cc:3949] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Tablet 
> e884bda6bbd3482f94c07ca0f34f99a4 (table quasi_realtime_user_feature 
> [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed 
> timeout. Replacing with a new tablet 0b144c00f35d48cca4d4981698faef72
> ...
> I1024 11:41:22.391916 180202 catalog_manager.cc:3806] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Sending 
> DeleteTablet for 3 replicas of tablet e884bda6bbd3482f94c07ca0f34f99a4
> ...
> I1024 11:41:22.391927 180202 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_DELETED) for tablet e884bda6bbd3482f94c07ca0f34f99a4 
> on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by 
> 0b144c00f35d48cca4d4981698faef72 at 2018-10-24 11:41:22 CST)
> ...
> W1024 11:41:22.428129 180146 catalog_manager.cc:2892] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for 
> tablet e884bda6bbd3482f94c07ca0f34f99a4 with error code TABLET_NOT_RUNNING: 
> Already present: State transition of tablet e884bda6bbd3482f94c07ca0f34f99a4 
> already in progress: creating tablet
> ...
> I1024 11:41:22.428143 180146 catalog_manager.cc:2700] Scheduling retry of 
> e884bda6bbd3482f94c07ca0f34f99a4 Delete Tablet RPC for 
> TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 35 ms (attempt = 1)
> ...
> W1024 11:41:22.683702 180145 catalog_manager.cc:2664] TS 
> b251540e606b4863bb576091ff961892 (kudu1.lt.163.org:7050): Create Tablet RPC 
> failed for tablet 0b144c00f35d48cca4d4981698faef72: Remote error: Service 
> unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService 
> from 10.120.219.118:59735 dropped due to backpressure. The service queue is 
> full; it has 512 items.
> I1024 11:41:22.683717 180145 catalog_manager.cc:2700] Scheduling retry of 
> CreateTablet RPC for tablet 0b144c00f35d48cca4d4981698faef72 on TS 
> b251540e606b4863bb576091ff961892 with a delay of 46 ms (attempt = 1)
> ...
> ==Be replaced by c0e0acc448fc42fc9e48f5025b112a75==
> W1024 11:41:52.775420 180202 catalog_manager.cc:3949] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Tablet 
> 0b144c00f35d48cca4d4981698faef72 (table quasi_realtime_user_feature 
> [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed 
> timeout. Replacing with a new tablet c0e0acc448fc42fc9e48f5025b112a75
> ...
> --Kudu-tserver log
> I1024 11:40:52.014571 137358