[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054423#comment-16054423 ] Hudson commented on YARN-6680: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11886 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11886/]) YARN-6680. Avoid locking overhead for NO_LABEL lookups. Contributed by (naganarasimha_gr: rev ee89ac84e68d3e181b75c63f74a0444f9d28146f) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 3.0.0-alpha4, 2.8.2, 2.9 > > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054242#comment-16054242 ] Naganarasimha G R commented on YARN-6680: - Thanks [~jlowe], agree with you on the multi-thread scenario/issue, that was the reason i was trying to mention that when ever write happens we need to update a new resource object instead of updating the existing one which will avoid the issue which you mentioned. Ok as i get other additional +1, i will commit this shortly ! > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054197#comment-16054197 ] Jason Lowe commented on YARN-6680: -- There definitely is a bug in the code with respect to locking in ResourceUsage, both before and after this proposed change. Besides the issues Daryn pointed out earlier, there's this problem: - Thread 1 calls getUsed on some label. Whether we lock or not, we can return the Resource object that is being used for bookkeeping. Once we return from the get, the caller has access to the bookeeping object with no locks held. - Thread 2 calls decUsed on the same label. It proceeds to mutate the _same Resource object_ with the write lock held. The lock doesn't help for this scenario, since Thread 1 already has the object being mutated and is not calling any ResourceUsage code at the time. - Thread 1 can now see an inconsistent view of the Resource, where the memory field has been decremented but the vcore field has yet to be decremented. In other words, a Resource usage that never actually occurred in practice. This locking bug has been there for quite some time. Daryn is simply optimizing what it already does today. I'm guessing the inconsistency isn't much of an issue in practice due to the granular scheduler and queue locks already being used during scheduling, which leaves the UI to show occasional inconsistent values since I believe it can grab these values without holding those same granular locks. I'm +1 for the patch. It significantly speeds up what is a very common case for us, and I suspect no node label is fairly common among other users as well. Eventually we should try to make this completely lockless as much as possible, using ConcurrentHashMap where the map stores atomic snapshot objects of state where we need to update many at once. But that's a more significant effort for another JIRA. This is a small change that offers a nice speedup for a common scenario in the interim. > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043716#comment-16043716 ] Naganarasimha G R commented on YARN-6680: - [~daryn], I am kind of +0 on this, so i would wait for another +1 to go ahead on this . bq. Cloning the resource would be very bad. Resource object allocation overhead is already very high. Well i was referring to not on the read part but on the write/modification part. > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042949#comment-16042949 ] Daryn Sharp commented on YARN-6680: --- Any feedback? > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035505#comment-16035505 ] Daryn Sharp commented on YARN-6680: --- # Maybe it was the same before, I just attacked the top few hot spots in a profile to get us unblocked. I think Nathan measured a ~10% performance reduction but it was enough to push the RM off a cliff. I overall achieved a 2X performance increase from my patches under this umbrella so maybe this was low hanging fruit. # Ah yes. There are other (unnecessary) RW lock hotspots, but the label manager map uses locks to protect the concurrent map for essentially making a consistent copy. A concurrent map can be iterated w/o CME but isn't guaranteed to visit every entry hence why I think the locks are there. # Yes. The goal is optimize the common case. # Cloning the resource would be very bad. Resource object allocation overhead is already very high. > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035337#comment-16035337 ] Naganarasimha G R commented on YARN-6680: - Thanks for the clarification [~daryn], my query was majorly related to see the CPU profiler output based on which these modifications are done as earlier you had just mentioned 3x performance decrease. And good to hear that its already been reviewed by [~jlowe] and [~nroberts] , But just for my clarification could you help me in understanding following points: # IIUC , the RW locks of CommonNodeLabelsManager is same as in 2.7, so wonder why the performance was impacted if you have migrated from 2.7.x to 2.8. # ??Eventually these maps should be concurrent maps which uses no lock for read ops, and the memory read barriers are cheap?? {{labelCollections}} and {{nodeCollections}} are already concurrentmaps. so you meant RW locks are not required? # Assume we solve the issue for NO_LABEL but still the flow is same for other labels. So issue will still be present with flows for other labels? # did not quite get your consistent and Inconsistent example though agree that the resource object got during read gets modified using the write lock as the same resource object is being returned instead of a clone. cc [~wangda] > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034834#comment-16034834 ] Daryn Sharp commented on YARN-6680: --- I use a profiler for performance work – hunches are inevitable wrong. [~jlowe] and [~nroberts] verified the improvement. 2.8 is current DOA, see [details|https://issues.apache.org/jira/browse/YARN-6679?focusedCommentId=16033655=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16033655]. This patch, along with my others under the umbrella, increased overall performance by ~2X. The scheduler's fine grain locking was a bad idea. RW locks are not cheap esp. for tiny critical sections. Write barriers are extremely expensive - slower than a hash lookup. Surprising but true. Eventually these maps should be concurrent maps which uses no lock for read ops, and the memory read barriers are cheap. The processor just sniffs the cache lines. bq. i feel the locks atleast in few places are required to maintain consistency. [...] read lock is required as intermittently node's partition mapping could be changed, or node can be deactivated etc... all the ops where write lock is held ? The locks currently do not provide guaranteed consistency. Example: # Consistent: #* thread1 read locks, gets resource, unlocks #* thread2 write locks, updates resource #* thread1 accesses resource – won't see thread2 update immediately # Inconsistent: #* thread1 write locks, updates resource #* thread2 read locks, gets resource, unlocks, accesses resource – will see thread1 update With my patch, the reader won't see the update in either case (unless it was also the writer). The question is does it matter? It's already a race due to no coarse grain lock to provide a snapshot view in time. Will it have detrimental impact to possibly see slightly stale data? If it does then there's already a major bug in code. In the end, this patch contributed to making 2.8 actually deployable. > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034125#comment-16034125 ] Naganarasimha G R commented on YARN-6680: - Hi [~daryn], {{"resulting in a 3X slowdown"}} is based on test results on a non partitioned cluster or based on a hunch ? if based on test results map be some thing can be uploaded ? Agree in the normal flows its unnecessary to use this map but if the labels are enabled then i feel the locks atleast in few places are required to maintain consistency. for ex. {{RMNodeLabelsManager.getResourceByLabel()}} though query is done for nolabel read lock is required as intermittently node's partition mapping could be changed, or node can be deactivated etc... all the ops where write lock is held ? > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033665#comment-16033665 ] Daryn Sharp commented on YARN-6680: --- Findbugs warning is from {{AggregatedLogFormat}} which is not part of this patch. > Avoid locking overhead for NO_LABEL lookups > --- > > Key: YARN-6680 > URL: https://issues.apache.org/jira/browse/YARN-6680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: YARN-6680.patch > > > Labels are managed via a hash that is protected with a read lock. The lock > acquire and release are each just as expensive as the hash lookup itself - > resulting in a 3X slowdown. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups
[ https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033462#comment-16033462 ] Hadoop QA commented on YARN-6680: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 8s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 53s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 97 unchanged - 0 fixed = 98 total (was 97) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6680 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870820/YARN-6680.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b332be5b529a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ff31035 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/16068/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16068/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt | | Test Results |