[jira] [Comment Edited] (KUDU-2453) kudu should stop creating tablet infinitely
[ https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977296#comment-16977296 ] Yingchun Lai edited comment on KUDU-2453 at 11/19/19 9:25 AM: -- We also happend to see this issue, I created another Jira to trace it, and also gave some ideas to resolve it. was (Author: acelyc111): We also happend to see this issue, I created another Jira to trace it, and also give some ideas to resolve it. > kudu should stop creating tablet infinitely > --- > > Key: KUDU-2453 > URL: https://issues.apache.org/jira/browse/KUDU-2453 > Project: Kudu > Issue Type: Bug > Components: master, tserver >Affects Versions: 1.4.0, 1.7.2 >Reporter: LiFu He >Priority: Major > > I have met this problem again on 2018/10/26. And now the kudu version is > 1.7.2. > - > We modified the flag 'max_create_tablets_per_ts' (2000) of master.conf, and > there are some load on the kudu cluster. Then someone else created a big > table which had tens of thousands of tablets from impala-shell (that was a > mistake). > {code:java} > CREATE TABLE XXX( > ... >PRIMARY KEY (...) > ) > PARTITION BY HASH (...) PARTITIONS 100, > RANGE (...) > ( > PARTITION "2018-10-24" <= VALUES < "2018-10-24\000", > PARTITION "2018-10-25" <= VALUES < "2018-10-25\000", > ... > PARTITION "2018-12-07" <= VALUES < "2018-12-07\000" > ) > STORED AS KUDU > TBLPROPERTIES ('kudu.master_addresses'= '...'); > {code} > Here are the logs after creating table (only pick one tablet as example): > {code:java} > --Kudu-master log > ==e884bda6bbd3482f94c07ca0f34f99a4== > W1024 11:40:51.914397 180146 catalog_manager.cc:2664] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): Create Tablet RPC > failed for tablet e884bda6bbd3482f94c07ca0f34f99a4: Remote error: Service > unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService > from 10.120.219.118:50247 dropped due to backpressure. The service queue is > full; it has 512 items. > I1024 11:40:51.914412 180146 catalog_manager.cc:2700] Scheduling retry of > CreateTablet RPC for tablet e884bda6bbd3482f94c07ca0f34f99a4 on TS > 39f15fcf42ef45bba0c95a3223dc25ee with a delay of 42 ms (attempt = 1) > ... > ==Be replaced by 0b144c00f35d48cca4d4981698faef72== > W1024 11:41:22.114512 180202 catalog_manager.cc:3949] T > P f6c9a09da7ef4fc191cab6276b942ba3: Tablet > e884bda6bbd3482f94c07ca0f34f99a4 (table quasi_realtime_user_feature > [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed > timeout. Replacing with a new tablet 0b144c00f35d48cca4d4981698faef72 > ... > I1024 11:41:22.391916 180202 catalog_manager.cc:3806] T > P f6c9a09da7ef4fc191cab6276b942ba3: Sending > DeleteTablet for 3 replicas of tablet e884bda6bbd3482f94c07ca0f34f99a4 > ... > I1024 11:41:22.391927 180202 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_DELETED) for tablet e884bda6bbd3482f94c07ca0f34f99a4 > on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by > 0b144c00f35d48cca4d4981698faef72 at 2018-10-24 11:41:22 CST) > ... > W1024 11:41:22.428129 180146 catalog_manager.cc:2892] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for > tablet e884bda6bbd3482f94c07ca0f34f99a4 with error code TABLET_NOT_RUNNING: > Already present: State transition of tablet e884bda6bbd3482f94c07ca0f34f99a4 > already in progress: creating tablet > ... > I1024 11:41:22.428143 180146 catalog_manager.cc:2700] Scheduling retry of > e884bda6bbd3482f94c07ca0f34f99a4 Delete Tablet RPC for > TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 35 ms (attempt = 1) > ... > W1024 11:41:22.683702 180145 catalog_manager.cc:2664] TS > b251540e606b4863bb576091ff961892 (kudu1.lt.163.org:7050): Create Tablet RPC > failed for tablet 0b144c00f35d48cca4d4981698faef72: Remote error: Service > unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService > from 10.120.219.118:59735 dropped due to backpressure. The service queue is > full; it has 512 items. > I1024 11:41:22.683717 180145 catalog_manager.cc:2700] Scheduling retry of > CreateTablet RPC for tablet 0b144c00f35d48cca4d4981698faef72 on TS > b251540e606b4863bb576091ff961892 with a delay of 46 ms (attempt = 1) > ... > ==Be replaced by c0e0acc448fc42fc9e48f5025b112a75== > W1024 11:41:52.775420 180202 catalog_manager.cc:3949] T > P f6c9a09da7ef4fc191cab6276b942ba3: Tablet > 0b144c00f35d48cca4d4981698faef72 (table quasi_realtime_user_feature > [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed >
[jira] [Comment Edited] (KUDU-2453) kudu should stop creating tablet infinitely
[ https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493153#comment-16493153 ] HeLifu edited comment on KUDU-2453 at 12/28/18 1:44 AM: -i digged it, and thought the reason was KUDU-1913.- -version 1.4.x- was (Author: helifu): i digged it, and thought the reason was KUDU-1913. version 1.4.x > kudu should stop creating tablet infinitely > --- > > Key: KUDU-2453 > URL: https://issues.apache.org/jira/browse/KUDU-2453 > Project: Kudu > Issue Type: Bug > Components: master, tserver >Affects Versions: 1.4.0, 1.7.2 >Reporter: HeLifu >Priority: Major > > I have met this problem again on 2018/10/26. And now the kudu version is > 1.7.2. > - > We modified the flag 'max_create_tablets_per_ts' (2000) of master.conf, and > there are some load on the kudu cluster. Then someone else created a big > table which had tens of thousands of tablets from impala-shell (that was a > mistake). > {code:java} > CREATE TABLE XXX( > ... >PRIMARY KEY (...) > ) > PARTITION BY HASH (...) PARTITIONS 100, > RANGE (...) > ( > PARTITION "2018-10-24" <= VALUES < "2018-10-24\000", > PARTITION "2018-10-25" <= VALUES < "2018-10-25\000", > ... > PARTITION "2018-12-07" <= VALUES < "2018-12-07\000" > ) > STORED AS KUDU > TBLPROPERTIES ('kudu.master_addresses'= '...'); > {code} > Here are the logs after creating table (only pick one tablet as example): > {code:java} > --Kudu-master log > ==e884bda6bbd3482f94c07ca0f34f99a4== > W1024 11:40:51.914397 180146 catalog_manager.cc:2664] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): Create Tablet RPC > failed for tablet e884bda6bbd3482f94c07ca0f34f99a4: Remote error: Service > unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService > from 10.120.219.118:50247 dropped due to backpressure. The service queue is > full; it has 512 items. > I1024 11:40:51.914412 180146 catalog_manager.cc:2700] Scheduling retry of > CreateTablet RPC for tablet e884bda6bbd3482f94c07ca0f34f99a4 on TS > 39f15fcf42ef45bba0c95a3223dc25ee with a delay of 42 ms (attempt = 1) > ... > ==Be replaced by 0b144c00f35d48cca4d4981698faef72== > W1024 11:41:22.114512 180202 catalog_manager.cc:3949] T > P f6c9a09da7ef4fc191cab6276b942ba3: Tablet > e884bda6bbd3482f94c07ca0f34f99a4 (table quasi_realtime_user_feature > [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed > timeout. Replacing with a new tablet 0b144c00f35d48cca4d4981698faef72 > ... > I1024 11:41:22.391916 180202 catalog_manager.cc:3806] T > P f6c9a09da7ef4fc191cab6276b942ba3: Sending > DeleteTablet for 3 replicas of tablet e884bda6bbd3482f94c07ca0f34f99a4 > ... > I1024 11:41:22.391927 180202 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_DELETED) for tablet e884bda6bbd3482f94c07ca0f34f99a4 > on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by > 0b144c00f35d48cca4d4981698faef72 at 2018-10-24 11:41:22 CST) > ... > W1024 11:41:22.428129 180146 catalog_manager.cc:2892] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for > tablet e884bda6bbd3482f94c07ca0f34f99a4 with error code TABLET_NOT_RUNNING: > Already present: State transition of tablet e884bda6bbd3482f94c07ca0f34f99a4 > already in progress: creating tablet > ... > I1024 11:41:22.428143 180146 catalog_manager.cc:2700] Scheduling retry of > e884bda6bbd3482f94c07ca0f34f99a4 Delete Tablet RPC for > TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 35 ms (attempt = 1) > ... > W1024 11:41:22.683702 180145 catalog_manager.cc:2664] TS > b251540e606b4863bb576091ff961892 (kudu1.lt.163.org:7050): Create Tablet RPC > failed for tablet 0b144c00f35d48cca4d4981698faef72: Remote error: Service > unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService > from 10.120.219.118:59735 dropped due to backpressure. The service queue is > full; it has 512 items. > I1024 11:41:22.683717 180145 catalog_manager.cc:2700] Scheduling retry of > CreateTablet RPC for tablet 0b144c00f35d48cca4d4981698faef72 on TS > b251540e606b4863bb576091ff961892 with a delay of 46 ms (attempt = 1) > ... > ==Be replaced by c0e0acc448fc42fc9e48f5025b112a75== > W1024 11:41:52.775420 180202 catalog_manager.cc:3949] T > P f6c9a09da7ef4fc191cab6276b942ba3: Tablet > 0b144c00f35d48cca4d4981698faef72 (table quasi_realtime_user_feature > [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed > timeout. Replacing with a new tablet c0e0acc448fc42fc9e48f5025b112a75 > ... >
[jira] [Comment Edited] (KUDU-2453) kudu should stop creating tablet infinitely
[ https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493153#comment-16493153 ] HeLifu edited comment on KUDU-2453 at 10/31/18 8:56 AM: i digged it, and thought the reason was KUDU-1913. version 1.4.x was (Author: helifu): i digged it, and thought the reason was [KUDU-1913|https://issues.apache.org/jira/browse/KUDU-1913] > kudu should stop creating tablet infinitely > --- > > Key: KUDU-2453 > URL: https://issues.apache.org/jira/browse/KUDU-2453 > Project: Kudu > Issue Type: Bug > Components: master, tserver >Affects Versions: 1.4.0, 1.7.2 >Reporter: HeLifu >Priority: Major > > I have met this problem again on 2018/10/26. And now the kudu version is > 1.7.2. > kudu-master's log as below: > {code:java} > I1031 16:21:21.644222 180146 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_DELETED) for tablet d1fd56be8eef44e782d509a0eeae9c15 > on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by > ff4fd0a538944d69b8a6beea81e5bb01 at 2018-10-24 12:39:17 CST) > W1031 16:21:21.644421 180146 catalog_manager.cc:2892] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for > tablet d1fd56be8eef44e782d509a0eeae9c15 with error code TABLET_NOT_RUNNING: > Already present: State transition of tablet d1fd56be8eef44e782d509a0eeae9c15 > already in progress: creating tablet > I1031 16:21:21.644436 180146 catalog_manager.cc:2700] Scheduling retry of > d1fd56be8eef44e782d509a0eeae9c15 Delete Tablet RPC for > TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 553 ms (attempt = 6) > {code} > kudu-tserver's log as below: > > {code:java} > I1031 16:21:22.197888 137341 tablet_service.cc:799] Processing DeleteTablet > for tablet d1fd56be8eef44e782d509a0eeae9c15 with delete_type > TABLET_DATA_DELETED (Replaced by ff4fd0a538944d69b8a6beea81e5bb01 at > 2018-10-24 12:39:17 CST) from {username='kudu'} at 10.120.219.118:50247 > I1031 16:21:22.230309 137131 maintenance_manager.cc:492] P > 39f15fcf42ef45bba0c95a3223dc25ee: > FlushDeltaMemStoresOp(70499bc0f9ac4d8196ae5a0be6ef0b8b) complete. Timing: > real 0.416suser 0.404s sys 0.008s Metrics: > {"fdatasync":3,"fdatasync_us":2583,"lbm_write_time_us":29,"lbm_writes_lt_1ms":4} > I1031 16:21:22.321700 137341 tablet_service.cc:799] Processing DeleteTablet > for tablet 74a30181dea9400a9bcfaeb56f83f379 with delete_type > TABLET_DATA_DELETED (Replaced by 31e350fddea443048946f5a20d3171bd at > 2018-10-31 16:21:13 CST) from {username='kudu'} at 10.120.219.118:50247 > I1031 16:21:22.350440 137341 tablet_service.cc:799] Processing DeleteTablet > for tablet 7c864af01309432c9a2a4d1c88bbe52b with delete_type > TABLET_DATA_DELETED (Replaced by ec4b733818d940e0af32c51bda3c7^C > {code} > > --- > We modified the flag '{color:#FF}max_create_tablets_per_ts{color}' (2000) > of master.conf, and there is some load on the kudu cluster. Then someone else > created a big table which had tens of thousands of tablets from impala-shell > (it was a mistake). > It was a long time for him to wait, so he did "ctrl+c". But we found that the > tablets in 'INITIALIZED' status was growing rapidly, half an hour later it > was 350,000 :( > We deleted this table by kudu client tool, and found that the number of > 'INITIALIZED' tablets was going down slowly. By simple estimating it will > take 10+ days to be back to normal. But luckily, the application system are > not affected. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)