[jira] [Comment Edited] (HADOOP-16828) Zookeeper Delegation Token Manager fetch sequence number by batch

2021-03-24 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308088#comment-17308088
 ] 

Fengnan Li edited comment on HADOOP-16828 at 3/24/21, 6:14 PM:
---

[~sodonnell] We use 1k which is where the graph is based. It is running well 
for the past one year without any issue. Our set up has 12 machines competing 
for this value in zk.
I think you can even change it to 10k without much negative impact.


was (Author: fengnanli):
[~sodonnell] We use 1k which is where the graph is based. It is running well 
for the past one year without any issue. I think you can even change it to 10k 
without much negative impact.

> Zookeeper Delegation Token Manager fetch sequence number by batch
> -
>
> Key: HADOOP-16828
> URL: https://issues.apache.org/jira/browse/HADOOP-16828
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HADOOP-16828.001.patch, HADOOP-16828.002.patch, Screen 
> Shot 2020-01-25 at 2.25.06 PM.png, Screen Shot 2020-01-25 at 2.25.16 PM.png, 
> Screen Shot 2020-01-25 at 2.25.24 PM.png
>
>
> Currently in ZKDelegationTokenSecretManager.java the seq number is 
> incremented by 1 each time there is a request for creating new token. This 
> will need to send traffic to Zookeeper server. With multiple managers 
> running, there is data contention going on. Also, since the current logic of 
> incrementing is using tryAndSet which is optimistic concurrency control 
> without locking. This data contention is having performance degradation when 
> the secret manager are under volume of traffic.
> The change here is to fetching this seq number by batch instead of 1, which 
> will reduce the traffic sent to ZK and make many operations inside ZK secret 
> manager's memory.
> After putting this into production we saw huge improvement to the RPC 
> processing latency of get delegationtoken calls. Also, since ZK takes less 
> traffic in this way. Other write calls, like renew and cancel delegation 
> tokens are benefiting from this change.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16828) Zookeeper Delegation Token Manager fetch sequence number by batch

2020-03-02 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049957#comment-17049957
 ] 

Fengnan Li edited comment on HADOOP-16828 at 3/3/20 7:00 AM:
-

[~xyao] Thanks very much for the review! Really appreciate it.

In fact, after the initial patch I found the bug of using 
delTokenSeqCounter.getCount() since it maintains a SharedCount in ZK which will 
be changed by other secret managers, so I replaced it with a constant value.

I also changed the logic a little bit so that we are competing for the starting 
of the range and then counting up v.s. competing for the upper limit and get 
the range start since the former is more intuitive to understand.

Uploaded [^HADOOP-16828.002.patch] addressing your comments as well.

The holes are expected as a tradeoff to this strategy. Many account 
registration services are adopting this for way faster id generation.


was (Author: fengnanli):
[~xyao] Thanks very much for the review! Really appreciate it.

In fact, after the initial patch I found the bug of using 
delTokenSeqCounter.getCount() since it maintains a SharedCount in ZK which will 
be changed by other secret managers, so I replaced it with a constant value.

I also changed the logic a little bit so that we are competing for the starting 
of the range and then counting up. v.s. competing for the upper limit and get 
the range start since the former is more intuitive to understand.

Uploaded [^HADOOP-16828.002.patch] addressing your comments as well.

The holes of expected as a tradeoff to this strategy. Many account registration 
services are adopting this for way faster id generation.

> Zookeeper Delegation Token Manager fetch sequence number by batch
> -
>
> Key: HADOOP-16828
> URL: https://issues.apache.org/jira/browse/HADOOP-16828
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HADOOP-16828.001.patch, HADOOP-16828.002.patch, Screen 
> Shot 2020-01-25 at 2.25.06 PM.png, Screen Shot 2020-01-25 at 2.25.16 PM.png, 
> Screen Shot 2020-01-25 at 2.25.24 PM.png
>
>
> Currently in ZKDelegationTokenSecretManager.java the seq number is 
> incremented by 1 each time there is a request for creating new token. This 
> will need to send traffic to Zookeeper server. With multiple managers 
> running, there is data contention going on. Also, since the current logic of 
> incrementing is using tryAndSet which is optimistic concurrency control 
> without locking. This data contention is having performance degradation when 
> the secret manager are under volume of traffic.
> The change here is to fetching this seq number by batch instead of 1, which 
> will reduce the traffic sent to ZK and make many operations inside ZK secret 
> manager's memory.
> After putting this into production we saw huge improvement to the RPC 
> processing latency of get delegationtoken calls. Also, since ZK takes less 
> traffic in this way. Other write calls, like renew and cancel delegation 
> tokens are benefiting from this change.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org