[ 
https://issues.apache.org/jira/browse/KUDU-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-2681.
-------------------------------
    Fix Version/s: 1.12.0
       Resolution: Duplicate

This seems like a duplicate of KUDU-2992

> Account for already-running tasks before sending new ones upon processing 
> tablet reports
> ----------------------------------------------------------------------------------------
>
>                 Key: KUDU-2681
>                 URL: https://issues.apache.org/jira/browse/KUDU-2681
>             Project: Kudu
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.7.1
>            Reporter: Andrew Wong
>            Priority: Major
>              Labels: scalability, supportability
>             Fix For: 1.12.0
>
>
> I've seen a case where the master will reschedule the same delete tablet task 
> for a given tablet multiple times, e.g. because it received a new tablet 
> report that the tablet still exists on a given tserver. This results in 
> significant log-spam, and ends up sending excessive RPCs to the tablet 
> servers. Here are some master logs demonstrating this (note the repeated 
> attempt numbers):
>  
> {{I0129 05:09:43.918886 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:43.919509 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:43.919517 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8226 ms (attempt = 10)}}
> {{I0129 05:09:43.960479 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:43.961150 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:43.961158 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8235 ms (attempt = 10)}}
> {{I0129 05:09:44.016152 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:44.016383 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:44.016391 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8206 ms (attempt = 10)}}
> {{I0129 05:09:44.226428 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:44.226753 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:44.226773 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8207 ms (attempt = 10)}}
> {{I0129 05:09:44.234709 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:44.234923 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:44.234936 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8199 ms (attempt = 10)}}
> {{I0129 05:09:44.322351 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:44.322854 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:44.322860 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8207 ms (attempt = 10)}}
>  
> Here is a similar log for alter schema requests:
>  
> {{I0129 05:09:50.798943 22188 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC with a delay of 60047 
> ms (attempt = 27)}}
> {{W0129 05:09:50.798954 22188 catalog_manager.cc:2719] Async tablet task 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC failed: Not found: 
> failed to reset TS proxy: Could not find TS for UUID 
> 94b8851c35914acca3ebea8c0ba6e06e}}
> {{I0129 05:09:50.889256 22191 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC with a delay of 60044 
> ms (attempt = 27)}}
> {{W0129 05:09:50.889267 22191 catalog_manager.cc:2719] Async tablet task 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC failed: Not found: 
> failed to reset TS proxy: Could not find TS for UUID 
> 94b8851c35914acca3ebea8c0ba6e06e}}
> {{I0129 05:09:50.928457 22188 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC with a delay of 60001 
> ms (attempt = 27)}}
> {{W0129 05:09:50.928469 22188 catalog_manager.cc:2719] Async tablet task 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC failed: Not found: 
> failed to reset TS proxy: Could not find TS for UUID 
> 94b8851c35914acca3ebea8c0ba6e06e}}
> {{I0129 05:09:51.024364 22188 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC with a delay of 60036 
> ms (attempt = 27)}}
> {{W0129 05:09:51.024372 22188 catalog_manager.cc:2719] Async tablet task 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC failed: Not found: 
> failed to reset TS proxy: Could not find TS for UUID 
> 94b8851c35914acca3ebea8c0ba6e06e}}
> {{I0129 05:09:51.061228 22191 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC with a delay of 60001 
> ms (attempt = 27)}}
> {{W0129 05:09:51.061239 22191 catalog_manager.cc:2719] Async tablet task 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC failed: Not found: 
> failed to reset TS proxy: Could not find TS for UUID 
> 94b8851c35914acca3ebea8c0ba6e06e}}
> {{I0129 05:09:51.080605 22191 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC with a delay of 60038 
> ms (attempt = 27)}}
> {{W0129 05:09:51.080643 22191 catalog_manager.cc:2719] Async tablet task 
> 1d75a2458b544c6ea01fb6ccb238ebbb (table <table_name> 
> [id=7149156d1c2f4f0d88e131848a907d8a]) Alter Table RPC failed: Not found: 
> failed to reset TS proxy: Could not find TS for UUID 
> 94b8851c35914acca3ebea8c0ba6e06e}}
>  
> Kudu should take into account existing, retrying tasks running on the master 
> to make sure we don't stack up in this way.
> I've seen this confirmed with Kudu 1.7.1, but looking through the code at 
> CatalogManager::ProcessTabletReport, this seems possible on master too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to