[jira] [Comment Edited] (KUDU-2453) kudu should stop creating tablet infinitely

2019-11-19 Thread Yingchun Lai (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977296#comment-16977296
 ] 

Yingchun Lai edited comment on KUDU-2453 at 11/19/19 9:25 AM:
--

We also happend to see this issue, I created another Jira to trace it, and also 
gave some ideas to resolve it.


was (Author: acelyc111):
We also happend to see this issue, I created another Jira to trace it, and also 
give some ideas to resolve it.

> kudu should stop creating tablet infinitely
> ---
>
> Key: KUDU-2453
> URL: https://issues.apache.org/jira/browse/KUDU-2453
> Project: Kudu
>  Issue Type: Bug
>  Components: master, tserver
>Affects Versions: 1.4.0, 1.7.2
>Reporter: LiFu He
>Priority: Major
>
> I have met this problem again on 2018/10/26. And now the kudu version is 
> 1.7.2.
> -
> We modified the flag 'max_create_tablets_per_ts' (2000) of master.conf, and 
> there are some load on the kudu cluster. Then someone else created a big 
> table which had tens of thousands of tablets from impala-shell (that was a 
> mistake). 
> {code:java}
> CREATE TABLE XXX(
> ...
>PRIMARY KEY (...)
> )
> PARTITION BY HASH (...) PARTITIONS 100,
> RANGE (...)
> (
>   PARTITION "2018-10-24" <= VALUES < "2018-10-24\000",
>   PARTITION "2018-10-25" <= VALUES < "2018-10-25\000",
>   ...
>   PARTITION "2018-12-07" <= VALUES < "2018-12-07\000"
> )
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'= '...');
> {code}
> Here are the logs after creating table (only pick one tablet as example):
> {code:java}
> --Kudu-master log
> ==e884bda6bbd3482f94c07ca0f34f99a4==
> W1024 11:40:51.914397 180146 catalog_manager.cc:2664] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): Create Tablet RPC 
> failed for tablet e884bda6bbd3482f94c07ca0f34f99a4: Remote error: Service 
> unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService 
> from 10.120.219.118:50247 dropped due to backpressure. The service queue is 
> full; it has 512 items.
> I1024 11:40:51.914412 180146 catalog_manager.cc:2700] Scheduling retry of 
> CreateTablet RPC for tablet e884bda6bbd3482f94c07ca0f34f99a4 on TS 
> 39f15fcf42ef45bba0c95a3223dc25ee with a delay of 42 ms (attempt = 1)
> ...
> ==Be replaced by 0b144c00f35d48cca4d4981698faef72==
> W1024 11:41:22.114512 180202 catalog_manager.cc:3949] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Tablet 
> e884bda6bbd3482f94c07ca0f34f99a4 (table quasi_realtime_user_feature 
> [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed 
> timeout. Replacing with a new tablet 0b144c00f35d48cca4d4981698faef72
> ...
> I1024 11:41:22.391916 180202 catalog_manager.cc:3806] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Sending 
> DeleteTablet for 3 replicas of tablet e884bda6bbd3482f94c07ca0f34f99a4
> ...
> I1024 11:41:22.391927 180202 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_DELETED) for tablet e884bda6bbd3482f94c07ca0f34f99a4 
> on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by 
> 0b144c00f35d48cca4d4981698faef72 at 2018-10-24 11:41:22 CST)
> ...
> W1024 11:41:22.428129 180146 catalog_manager.cc:2892] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for 
> tablet e884bda6bbd3482f94c07ca0f34f99a4 with error code TABLET_NOT_RUNNING: 
> Already present: State transition of tablet e884bda6bbd3482f94c07ca0f34f99a4 
> already in progress: creating tablet
> ...
> I1024 11:41:22.428143 180146 catalog_manager.cc:2700] Scheduling retry of 
> e884bda6bbd3482f94c07ca0f34f99a4 Delete Tablet RPC for 
> TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 35 ms (attempt = 1)
> ...
> W1024 11:41:22.683702 180145 catalog_manager.cc:2664] TS 
> b251540e606b4863bb576091ff961892 (kudu1.lt.163.org:7050): Create Tablet RPC 
> failed for tablet 0b144c00f35d48cca4d4981698faef72: Remote error: Service 
> unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService 
> from 10.120.219.118:59735 dropped due to backpressure. The service queue is 
> full; it has 512 items.
> I1024 11:41:22.683717 180145 catalog_manager.cc:2700] Scheduling retry of 
> CreateTablet RPC for tablet 0b144c00f35d48cca4d4981698faef72 on TS 
> b251540e606b4863bb576091ff961892 with a delay of 46 ms (attempt = 1)
> ...
> ==Be replaced by c0e0acc448fc42fc9e48f5025b112a75==
> W1024 11:41:52.775420 180202 catalog_manager.cc:3949] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Tablet 
> 0b144c00f35d48cca4d4981698faef72 (table quasi_realtime_user_feature 
> [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed 
> 

[jira] [Comment Edited] (KUDU-2453) kudu should stop creating tablet infinitely

2018-12-27 Thread HeLifu (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493153#comment-16493153
 ] 

HeLifu edited comment on KUDU-2453 at 12/28/18 1:44 AM:


-i digged it, and thought the reason was KUDU-1913.-

-version 1.4.x-


was (Author: helifu):
i digged it, and thought the reason was KUDU-1913.

version 1.4.x

> kudu should stop creating tablet infinitely
> ---
>
> Key: KUDU-2453
> URL: https://issues.apache.org/jira/browse/KUDU-2453
> Project: Kudu
>  Issue Type: Bug
>  Components: master, tserver
>Affects Versions: 1.4.0, 1.7.2
>Reporter: HeLifu
>Priority: Major
>
> I have met this problem again on 2018/10/26. And now the kudu version is 
> 1.7.2.
> -
> We modified the flag 'max_create_tablets_per_ts' (2000) of master.conf, and 
> there are some load on the kudu cluster. Then someone else created a big 
> table which had tens of thousands of tablets from impala-shell (that was a 
> mistake). 
> {code:java}
> CREATE TABLE XXX(
> ...
>PRIMARY KEY (...)
> )
> PARTITION BY HASH (...) PARTITIONS 100,
> RANGE (...)
> (
>   PARTITION "2018-10-24" <= VALUES < "2018-10-24\000",
>   PARTITION "2018-10-25" <= VALUES < "2018-10-25\000",
>   ...
>   PARTITION "2018-12-07" <= VALUES < "2018-12-07\000"
> )
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'= '...');
> {code}
> Here are the logs after creating table (only pick one tablet as example):
> {code:java}
> --Kudu-master log
> ==e884bda6bbd3482f94c07ca0f34f99a4==
> W1024 11:40:51.914397 180146 catalog_manager.cc:2664] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): Create Tablet RPC 
> failed for tablet e884bda6bbd3482f94c07ca0f34f99a4: Remote error: Service 
> unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService 
> from 10.120.219.118:50247 dropped due to backpressure. The service queue is 
> full; it has 512 items.
> I1024 11:40:51.914412 180146 catalog_manager.cc:2700] Scheduling retry of 
> CreateTablet RPC for tablet e884bda6bbd3482f94c07ca0f34f99a4 on TS 
> 39f15fcf42ef45bba0c95a3223dc25ee with a delay of 42 ms (attempt = 1)
> ...
> ==Be replaced by 0b144c00f35d48cca4d4981698faef72==
> W1024 11:41:22.114512 180202 catalog_manager.cc:3949] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Tablet 
> e884bda6bbd3482f94c07ca0f34f99a4 (table quasi_realtime_user_feature 
> [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed 
> timeout. Replacing with a new tablet 0b144c00f35d48cca4d4981698faef72
> ...
> I1024 11:41:22.391916 180202 catalog_manager.cc:3806] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Sending 
> DeleteTablet for 3 replicas of tablet e884bda6bbd3482f94c07ca0f34f99a4
> ...
> I1024 11:41:22.391927 180202 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_DELETED) for tablet e884bda6bbd3482f94c07ca0f34f99a4 
> on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by 
> 0b144c00f35d48cca4d4981698faef72 at 2018-10-24 11:41:22 CST)
> ...
> W1024 11:41:22.428129 180146 catalog_manager.cc:2892] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for 
> tablet e884bda6bbd3482f94c07ca0f34f99a4 with error code TABLET_NOT_RUNNING: 
> Already present: State transition of tablet e884bda6bbd3482f94c07ca0f34f99a4 
> already in progress: creating tablet
> ...
> I1024 11:41:22.428143 180146 catalog_manager.cc:2700] Scheduling retry of 
> e884bda6bbd3482f94c07ca0f34f99a4 Delete Tablet RPC for 
> TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 35 ms (attempt = 1)
> ...
> W1024 11:41:22.683702 180145 catalog_manager.cc:2664] TS 
> b251540e606b4863bb576091ff961892 (kudu1.lt.163.org:7050): Create Tablet RPC 
> failed for tablet 0b144c00f35d48cca4d4981698faef72: Remote error: Service 
> unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService 
> from 10.120.219.118:59735 dropped due to backpressure. The service queue is 
> full; it has 512 items.
> I1024 11:41:22.683717 180145 catalog_manager.cc:2700] Scheduling retry of 
> CreateTablet RPC for tablet 0b144c00f35d48cca4d4981698faef72 on TS 
> b251540e606b4863bb576091ff961892 with a delay of 46 ms (attempt = 1)
> ...
> ==Be replaced by c0e0acc448fc42fc9e48f5025b112a75==
> W1024 11:41:52.775420 180202 catalog_manager.cc:3949] T 
>  P f6c9a09da7ef4fc191cab6276b942ba3: Tablet 
> 0b144c00f35d48cca4d4981698faef72 (table quasi_realtime_user_feature 
> [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed 
> timeout. Replacing with a new tablet c0e0acc448fc42fc9e48f5025b112a75
> ...
> 

[jira] [Comment Edited] (KUDU-2453) kudu should stop creating tablet infinitely

2018-10-31 Thread HeLifu (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493153#comment-16493153
 ] 

HeLifu edited comment on KUDU-2453 at 10/31/18 8:56 AM:


i digged it, and thought the reason was KUDU-1913.

version 1.4.x


was (Author: helifu):
i digged it, and thought the reason was 
[KUDU-1913|https://issues.apache.org/jira/browse/KUDU-1913]

> kudu should stop creating tablet infinitely
> ---
>
> Key: KUDU-2453
> URL: https://issues.apache.org/jira/browse/KUDU-2453
> Project: Kudu
>  Issue Type: Bug
>  Components: master, tserver
>Affects Versions: 1.4.0, 1.7.2
>Reporter: HeLifu
>Priority: Major
>
> I have met this problem again on 2018/10/26. And now the kudu version is 
> 1.7.2.
> kudu-master's log as below:
> {code:java}
> I1031 16:21:21.644222 180146 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_DELETED) for tablet d1fd56be8eef44e782d509a0eeae9c15 
> on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by 
> ff4fd0a538944d69b8a6beea81e5bb01 at 2018-10-24 12:39:17 CST)
> W1031 16:21:21.644421 180146 catalog_manager.cc:2892] TS 
> 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for 
> tablet d1fd56be8eef44e782d509a0eeae9c15 with error code TABLET_NOT_RUNNING: 
> Already present: State transition of tablet d1fd56be8eef44e782d509a0eeae9c15 
> already in progress: creating tablet
> I1031 16:21:21.644436 180146 catalog_manager.cc:2700] Scheduling retry of 
> d1fd56be8eef44e782d509a0eeae9c15 Delete Tablet RPC for 
> TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 553 ms (attempt = 6)
> {code}
> kudu-tserver's log as below:
>  
> {code:java}
> I1031 16:21:22.197888 137341 tablet_service.cc:799] Processing DeleteTablet 
> for tablet d1fd56be8eef44e782d509a0eeae9c15 with delete_type 
> TABLET_DATA_DELETED (Replaced by ff4fd0a538944d69b8a6beea81e5bb01 at 
> 2018-10-24 12:39:17 CST) from {username='kudu'} at 10.120.219.118:50247
> I1031 16:21:22.230309 137131 maintenance_manager.cc:492] P 
> 39f15fcf42ef45bba0c95a3223dc25ee: 
> FlushDeltaMemStoresOp(70499bc0f9ac4d8196ae5a0be6ef0b8b) complete. Timing: 
> real 0.416suser 0.404s sys 0.008s Metrics: 
> {"fdatasync":3,"fdatasync_us":2583,"lbm_write_time_us":29,"lbm_writes_lt_1ms":4}
> I1031 16:21:22.321700 137341 tablet_service.cc:799] Processing DeleteTablet 
> for tablet 74a30181dea9400a9bcfaeb56f83f379 with delete_type 
> TABLET_DATA_DELETED (Replaced by 31e350fddea443048946f5a20d3171bd at 
> 2018-10-31 16:21:13 CST) from {username='kudu'} at 10.120.219.118:50247
> I1031 16:21:22.350440 137341 tablet_service.cc:799] Processing DeleteTablet 
> for tablet 7c864af01309432c9a2a4d1c88bbe52b with delete_type 
> TABLET_DATA_DELETED (Replaced by ec4b733818d940e0af32c51bda3c7^C
> {code}
>  
> ---
> We modified the flag '{color:#FF}max_create_tablets_per_ts{color}' (2000) 
> of master.conf, and there is some load on the kudu cluster. Then someone else 
> created a big table which had tens of thousands of tablets from impala-shell 
> (it was a mistake).
> It was a long time for him to wait, so he did "ctrl+c". But we found that the 
> tablets in 'INITIALIZED' status was growing rapidly, half an hour later it 
> was 350,000 :(
> We deleted this table by kudu client tool, and found that the number of 
> 'INITIALIZED' tablets was going down slowly. By simple estimating it will 
> take 10+ days to be back to normal.  But luckily, the application system are 
> not affected.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)