Clay B. created YARN-7106: ----------------------------- Summary: YARN RM can be crashed requesting too many delegation tokens Key: YARN-7106 URL: https://issues.apache.org/jira/browse/YARN-7106 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Clay B.
My team has seen an interesting issue where a cluster can suffer an accidental denial-of-service when someone (or something) requests tons of delegation tokens; resulting in the RM becoming unresponsive. Particularly, one sees the symptoms of YARN-2368 - "ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB" but instead for us it is when the RM goes to enumerate the znode /rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDelegationTokensRoot. (Note, DelegationTokens not a znode path tied to application ID's.) We seem to have some users who are good at causing this via Oozie. One can also trigger this by "errantly" running the following in a tight loop: curl -H "Content-Type: application/json" -X POST -d '{ "renewer" : "hdfsdu" }' -u : --negotiate http://f-bcpc-vm2.example.bloomberg.com:8088/ws/v1/cluster/delegation-token However, what I can't find is any limitation of the number of delegation tokens per user; nor can I find a way to see where the requests are coming from. (I.e. I would like if I could get the IP of clients requesting tokens; though the znodes do have the user to at least to track down the who -- but not the where, often something one must do when a cluster user has an errant job.) In the shoes of YARN-2962, perhaps some remediation could be a namespace in ZK per user - so one can only denial-of-service themself, or YARN could even raise an exception if more tokens than some threshold are outstanding. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org