[jira] [Updated] (YARN-6770) [Docs] A small mistake in the example of TimelineClient
[ https://issues.apache.org/jira/browse/YARN-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-6770: Fix Version/s: (was: 2.9) (was: 2.8.3) 2.8.2 2.9.0 > [Docs] A small mistake in the example of TimelineClient > --- > > Key: YARN-6770 > URL: https://issues.apache.org/jira/browse/YARN-6770 > Project: Hadoop YARN > Issue Type: Bug > Components: docs >Reporter: Jinjiang Ling >Assignee: Jinjiang Ling >Priority: Trivial > Labels: newbie > Fix For: 2.9.0, 3.0.0-beta1, 2.8.2 > > Attachments: YARN-6770.patch > > > I'm trying to use timeline client, then I copy the > [example|http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Publishing_of_application_specific_data] > into my application. > But there is a small mistake here: > {quote} > myDomain.*_setID_*("MyDomain"); > . > myEntity.*_setEntityID_*("MyApp1") > {quote} > The correct one should be > {quote} > myDomain.*_setId_*("MyDomain"); > . > myEntity._*setEntityId*_("MyApp1"); > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6809) Fix typo in ResourceManagerHA.md
[ https://issues.apache.org/jira/browse/YARN-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-6809: Fix Version/s: (was: 2.8.3) 2.8.2 > Fix typo in ResourceManagerHA.md > > > Key: YARN-6809 > URL: https://issues.apache.org/jira/browse/YARN-6809 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Yeliang Cang >Priority: Trivial > Labels: newbie > Fix For: 2.9.0, 3.0.0-beta1, 2.8.2 > > Attachments: YARN-6809.001.patch > > > {noformat:title=ResourceManagerHA.md} > ### Recovering prevous active-RM's state > {noformat} > prevous should be previous. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-2113: Fix Version/s: (was: 2.8.3) 2.8.2 > Add cross-user preemption within CapacityScheduler's leaf-queue > --- > > Key: YARN-2113 > URL: https://issues.apache.org/jira/browse/YARN-2113 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Sunil G > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: IntraQueue Preemption-Impact Analysis.pdf, > TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt, > YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, > YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, > YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, > YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, > YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, > YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, > YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, > YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, > YARN-2113.branch-2.0021.patch, YARN-2113.branch-2.8.0019.patch, > YARN-2113.branch-2.8.0020.patch, YARN-2113 Intra-QueuePreemption > Behavior.pdf, YARN-2113.v0.patch > > > Preemption today only works across queues and moves around resources across > queues per demand and usage. We should also have user-level preemption within > a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6428) Queue AM limit is not honored in CS always
[ https://issues.apache.org/jira/browse/YARN-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-6428: Fix Version/s: (was: 2.9) (was: 2.8.3) 2.8.2 2.9.0 > Queue AM limit is not honored in CS always > --- > > Key: YARN-6428 > URL: https://issues.apache.org/jira/browse/YARN-6428 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.9.0, 3.0.0-beta1, 2.8.2 > > Attachments: YARN-6428.0001.patch, YARN-6428.0002.patch, > YARN-6428.0003.patch, YARN-6428-branch-2.8.0003.patch > > > Steps to reproduce > > Setup cluster with 40 GB and 40 vcores with 4 Node managers with 10 GB each. > Configure 100% to default queue as capacity and max am limit as 10 % > Minimum scheduler memory and vcore as 512,1 > *Expected* > AM limit 4096 and 4 vores > *Actual* > AM limit 4096+512 and 4+1 vcore -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6844) AMRMClientImpl.checkNodeLabelExpression() has wrong error message
[ https://issues.apache.org/jira/browse/YARN-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099466#comment-16099466 ] Hudson commented on YARN-6844: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12051 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12051/]) YARN-6844. AMRMClientImpl.checkNodeLabelExpression() has wrong error (templedf: rev 4c40cd451cbdbce5d2b94ad0e7e3cc991c3439c5) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java > AMRMClientImpl.checkNodeLabelExpression() has wrong error message > - > > Key: YARN-6844 > URL: https://issues.apache.org/jira/browse/YARN-6844 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.1, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Manikandan R >Priority: Minor > Labels: newbie > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-6844.001.patch > > > It says, "Cannot specify more than two node labels in a single node label > expression," bit it should say that you can't have more than *one*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6779) DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() should be @VisibleForTesting
[ https://issues.apache.org/jira/browse/YARN-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099467#comment-16099467 ] Hudson commented on YARN-6779: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12051 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12051/]) YARN-6779. (templedf: rev bb30bd3771442df253cbe55c448379580bd5ad07) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java > DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() > should be @VisibleForTesting > > > Key: YARN-6779 > URL: https://issues.apache.org/jira/browse/YARN-6779 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.1, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Yeliang Cang >Priority: Trivial > Labels: newbie > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-6779-001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5146) [YARN-3368] Supports Fair Scheduler in new YARN UI
[ https://issues.apache.org/jira/browse/YARN-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099457#comment-16099457 ] Hadoop QA commented on YARN-5146: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 0m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-5146 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12878674/YARN-5146.004.patch | | Optional Tests | asflicense | | uname | Linux 9203c8cfdaa1 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c98201b | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16534/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > [YARN-3368] Supports Fair Scheduler in new YARN UI > -- > > Key: YARN-5146 > URL: https://issues.apache.org/jira/browse/YARN-5146 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Abdullah Yousufi > Attachments: YARN-5146.001.patch, YARN-5146.002.patch, > YARN-5146.003.patch, YARN-5146.004.patch > > > Current implementation in branch YARN-3368 only support capacity scheduler, > we want to make it support fair scheduler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6855) CLI Proto Modifications to support Node Attributes
[ https://issues.apache.org/jira/browse/YARN-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099409#comment-16099409 ] Sunil G edited comment on YARN-6855 at 7/25/17 2:50 AM: Thanks [~naganarasimha...@apache.org] for the effort. Few comments In +{{NodeAttribute}}+, +{{NodeAttributeType}}+, +{{NodeIdToAttributes}}+ and +{{NodesToAttributesMappingRequest}}+ # I think lets make this class as Unstable from Evolving as its a new api itself. In the course, we can make to Evolving. # please add more java doc. # I think its too early to place Stable for NodeAttributeType. Since its public and its an enum, is its ok if we mark interface stability with Unstable/Evolving to start with. In +{{NodesToAttributesMappingRequest}}+ and +{{yarn_server_resourcemanager_service_protos.proto}}+ # I think {{operation}} could be an enum here. String may be too generic and complex to do type checks. In general # {{NodeAttributePBImpl#equals}} has some duplicate code. # in {{NodesToAttributesMappingRequestPBImpl}}, {{operation}} needs some change if above comments is accepted. I will take a second look and will share comments if any. was (Author: sunilg): Thanks [~naganarasimha...@apache.org] for the effort. Few comments In +{{NodeAttribute}}+, +{{NodeAttributeType}}+, +{{NodeIdToAttributes}}+ and +{{NodesToAttributesMappingRequest}}+ # I think lets make this class as Unstable from Evolving as its a new api itself. In the course, we can make to Evolving. # please add more java doc. # I think its too early to place Stable for NodeAttributeType. Since its public and its an enum, is its ok if we mark interface stability with Unstable/Evolving to start with. In +{{NodesToAttributesMappingRequest}}+ and +{{yarn_server_resourcemanager_service_protos.proto}}+ # I think {{operation}} could be an enum here. String may be too generic and complex to do type checks. In general # {{NodeAttributePBImpl#equals}} has some duplicate code. # > CLI Proto Modifications to support Node Attributes > -- > > Key: YARN-6855 > URL: https://issues.apache.org/jira/browse/YARN-6855 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, client >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-6855-YARN-3409.001.patch, > YARN-6855-YARN-3409.002.patch, YARN-6855-YARN-3409.003.patch > > > This jira focuses only on the proto modifications required for the CLI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6855) CLI Proto Modifications to support Node Attributes
[ https://issues.apache.org/jira/browse/YARN-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099409#comment-16099409 ] Sunil G commented on YARN-6855: --- Thanks [~naganarasimha...@apache.org] for the effort. Few comments In +{{NodeAttribute}}+, +{{NodeAttributeType}}+, +{{NodeIdToAttributes}}+ and +{{NodesToAttributesMappingRequest}}+ # I think lets make this class as Unstable from Evolving as its a new api itself. In the course, we can make to Evolving. # please add more java doc. # I think its too early to place Stable for NodeAttributeType. Since its public and its an enum, is its ok if we mark interface stability with Unstable/Evolving to start with. In +{{NodesToAttributesMappingRequest}}+ and +{{yarn_server_resourcemanager_service_protos.proto}}+ # I think {{operation}} could be an enum here. String may be too generic and complex to do type checks. In general # {{NodeAttributePBImpl#equals}} has some duplicate code. # > CLI Proto Modifications to support Node Attributes > -- > > Key: YARN-6855 > URL: https://issues.apache.org/jira/browse/YARN-6855 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, client >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-6855-YARN-3409.001.patch, > YARN-6855-YARN-3409.002.patch, YARN-6855-YARN-3409.003.patch > > > This jira focuses only on the proto modifications required for the CLI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099402#comment-16099402 ] Naganarasimha G R commented on YARN-3409: - Thanks [~wangda] & [~kkaranasos] for sharing your comments, bq. NodeIdToAttributesProto actually be NodeToAttributesProto agree, wanted to further discuss on this, Inside the proto i was sending across NodeId, I think we can change it to string, Also even if user specify the hostname/IP:port format do we need to consider picking "hostname/IP" only, thoughts ? bq. It was not clear to me how the newly added node attributes are going to play with existing node labels. Is the plan to share some code or will it be completely separate? Agree with Wangda's reply on this, there are still lot of differences even in the way we are going use CommonNodeLabelManager for a Attribute and a Partition. Will try to bring the hierarchy in it so that some common things can be reused. bq. The main part of "2. API proto changes" should be read as proposal#1, and "Alternate Proposal 1"/"Alternate Proposal 2" should be read as proposal #2 and #3. Yep, was trying to capture it as alternates to the proposed proposal. bq. is the plan to use the new constraints API we are introducing in YARN-6593? Yes, Its same as per the earlier discussion, but did not get a chance to completely review it. Will have a look at it and if any issues will point out there. bq. In the CLI API the replace and update seem a bit confusing to me ... Agree with [~wangda]'s point, and my preference would be {{add}} bq. Sounds a little ambiguous as it does not directly look like the existing attributes on the node will be removed, but we can make this clear in the description of the command. Agree, i think i have captured in the CLI patch and if required will update more bq. node1:att1=val1 looks better than node1=att1:val1. IMHO i would prefer the latter one itself for following reasons 1. It might be general scenario to specify multiple mapping of attribute value pairs to a given node and it would be less input for users. 2. With earlier notation, port was specified after the {{":"}} before {{"="}} for labels, so it would less intuitive for users to use the new format 3. With attribute type it might not sound very effective. like {{node1:att1=val1:type}} or {{node1:att1:type=val1}} > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099399#comment-16099399 ] Sunil G commented on YARN-3409: --- Thanks [~kkaranasos] for comments. Some quick thoughts here. bq.is the plan to use the new constraints API we are introducing in YARN-6593? Yes, you are correct. We will be using new api set from YARN-6593. There are some minor thoughts in relation with that, i ll add same in YARN-6593. bq.In the CLI API the replace and update seem a bit confusing to me ... Adding some more thoughts. In line with [~leftnoteasy] comments, {{replace}} is a single operation, since it is a superset of {{remove}} and {{remove+add}}. We support this in existing node labels with a similar command but syntax is sometime confusing. And I think, and as also mentioned above, {{add}} / {{remove}} will be an operation which may be more happening in system. {{replace}} might be happening when we take system to maintenance mode and do some upgrades. I think more descriptive help seems a very nice addition and better documentation. for cli, we ll handle both. > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6733) Add table for storing sub-application entities
[ https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099398#comment-16099398 ] Rohith Sharma K S commented on YARN-6733: - Thanks [~vrushalic] for detailed explanation. I am fine to keep to it. +1 LGTM. I will commit it later of today if no more objections. > Add table for storing sub-application entities > -- > > Key: YARN-6733 > URL: https://issues.apache.org/jira/browse/YARN-6733 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: IMG_7040.JPG, YARN-6733-YARN-5355.001.patch, > YARN-6733-YARN-5355.002.patch, YARN-6733-YARN-5355.003.patch, > YARN-6733-YARN-5355.004.patch, YARN-6733-YARN-5355.005.patch, > YARN-6733-YARN-5355.006.patch, YARN-6733-YARN-5355.007.patch, > YARN-6733-YARN-5355.008.patch > > > After a discussion with Tez folks, we have been thinking over introducing a > table to store sub-application information. > For example, if a Tez session runs for a certain period as User X and runs a > few AMs. These AMs accept DAGs from other users. Tez will execute these dags > with a doAs user. ATSv2 should store this information in a new table perhaps > called as "sub_application" table. > This jira tracks the code changes needed for table schema creation. > I will file other jiras for writing to that table, updating the user name > fields to include sub-application user etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6779) DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() should be @VisibleForTesting
[ https://issues.apache.org/jira/browse/YARN-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099391#comment-16099391 ] Yeliang Cang commented on YARN-6779: Thank you for your review, [~templedf]! > DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() > should be @VisibleForTesting > > > Key: YARN-6779 > URL: https://issues.apache.org/jira/browse/YARN-6779 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.1, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Yeliang Cang >Priority: Trivial > Labels: newbie > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-6779-001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6593) [API] Introduce Placement Constraint object
[ https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-6593: Target Version/s: 3.0.0-beta1 Fix Version/s: (was: 3.0.0-alpha3) Description: Just removed Fixed version and moved it to target version as we set fix version only after patch is committed. (was: This JIRA introduces an object for defining placement constraints.) > [API] Introduce Placement Constraint object > --- > > Key: YARN-6593 > URL: https://issues.apache.org/jira/browse/YARN-6593 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-6593.001.patch, YARN-6593.002.patch, > YARN-6593.003.patch, YARN-6593.004.patch > > > Just removed Fixed version and moved it to target version as we set fix > version only after patch is committed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6413) Decouple Yarn Registry API from ZK
[ https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099352#comment-16099352 ] Ellen Hui edited comment on YARN-6413 at 7/25/17 1:10 AM: -- bq. looks like RegistryUtils#extractServiceRecords is removed. Do you mind change RegistryDNSServer to use the new methods ? (i.e. make this patch compile with yarn-native-services branch) Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that depends on having the hierarchical namespace, which is why it was removed. I can add a select-multiple method with a filter of some sort, would that work? The point of this from our end was to abstract away the path. Right now yarn-native-services doesn't compile for me with a pom.xml error (pulled just now), is the branch healthy? bq. The DNS today depends on the ZK path layout to reconstruct the service record. So changing the zk path will break DNS. Ok, I will put the ZK path layout back to the way it was before. was (Author: ellenfkh): bq:looks like RegistryUtils#extractServiceRecords is removed. Do you mind change RegistryDNSServer to use the new methods ? (i.e. make this patch compile with yarn-native-services branch) Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that depends on having the hierarchical namespace, which is why it was removed. I can add a select-multiple method with a filter of some sort, would that work? The point of this from our end was to abstract away the path. Right now yarn-native-services doesn't compile for me with a pom.xml error (pulled just now), is the branch healthy? bq:The DNS today depends on the ZK path layout to reconstruct the service record. So changing the zk path will break DNS. Ok, I will put the ZK path layout back to the way it was before. > Decouple Yarn Registry API from ZK > -- > > Key: YARN-6413 > URL: https://issues.apache.org/jira/browse/YARN-6413 > Project: Hadoop YARN > Issue Type: Improvement > Components: amrmproxy, api, resourcemanager >Reporter: Ellen Hui >Assignee: Ellen Hui > Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch > > > Right now the Yarn Registry API (defined in the RegistryOperations interface) > is a very thin layer over Zookeeper. This jira proposes changing the > interface to abstract away the implementation details so that we can write a > FS-based implementation of the registry service, which will be used to > support AMRMProxy HA. > The new interface will use register/delete/resolve APIs instead of > Zookeeper-specific operations like mknode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6413) Decouple Yarn Registry API from ZK
[ https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099352#comment-16099352 ] Ellen Hui commented on YARN-6413: - bq: looks like RegistryUtils#extractServiceRecords is removed. Do you mind change RegistryDNSServer to use the new methods ? (i.e. make this patch compile with yarn-native-services branch) Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that depends on having the hierarchical namespace, which is why it was removed. I can add a select-multiple method with a filter of some sort, would that work? The point of this from our end was to abstract away the path. Right now yarn-native-services doesn't compile for me with a pom.xml error (pulled just now), is the branch healthy? bq: The DNS today depends on the ZK path layout to reconstruct the service record. So changing the zk path will break DNS. Ok, I will put the ZK path layout back to the way it was before. > Decouple Yarn Registry API from ZK > -- > > Key: YARN-6413 > URL: https://issues.apache.org/jira/browse/YARN-6413 > Project: Hadoop YARN > Issue Type: Improvement > Components: amrmproxy, api, resourcemanager >Reporter: Ellen Hui >Assignee: Ellen Hui > Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch > > > Right now the Yarn Registry API (defined in the RegistryOperations interface) > is a very thin layer over Zookeeper. This jira proposes changing the > interface to abstract away the implementation details so that we can write a > FS-based implementation of the registry service, which will be used to > support AMRMProxy HA. > The new interface will use register/delete/resolve APIs instead of > Zookeeper-specific operations like mknode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6413) Decouple Yarn Registry API from ZK
[ https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099352#comment-16099352 ] Ellen Hui edited comment on YARN-6413 at 7/25/17 1:09 AM: -- bq:looks like RegistryUtils#extractServiceRecords is removed. Do you mind change RegistryDNSServer to use the new methods ? (i.e. make this patch compile with yarn-native-services branch) Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that depends on having the hierarchical namespace, which is why it was removed. I can add a select-multiple method with a filter of some sort, would that work? The point of this from our end was to abstract away the path. Right now yarn-native-services doesn't compile for me with a pom.xml error (pulled just now), is the branch healthy? bq:The DNS today depends on the ZK path layout to reconstruct the service record. So changing the zk path will break DNS. Ok, I will put the ZK path layout back to the way it was before. was (Author: ellenfkh): bq: looks like RegistryUtils#extractServiceRecords is removed. Do you mind change RegistryDNSServer to use the new methods ? (i.e. make this patch compile with yarn-native-services branch) Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that depends on having the hierarchical namespace, which is why it was removed. I can add a select-multiple method with a filter of some sort, would that work? The point of this from our end was to abstract away the path. Right now yarn-native-services doesn't compile for me with a pom.xml error (pulled just now), is the branch healthy? bq: The DNS today depends on the ZK path layout to reconstruct the service record. So changing the zk path will break DNS. Ok, I will put the ZK path layout back to the way it was before. > Decouple Yarn Registry API from ZK > -- > > Key: YARN-6413 > URL: https://issues.apache.org/jira/browse/YARN-6413 > Project: Hadoop YARN > Issue Type: Improvement > Components: amrmproxy, api, resourcemanager >Reporter: Ellen Hui >Assignee: Ellen Hui > Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch > > > Right now the Yarn Registry API (defined in the RegistryOperations interface) > is a very thin layer over Zookeeper. This jira proposes changing the > interface to abstract away the implementation details so that we can write a > FS-based implementation of the registry service, which will be used to > support AMRMProxy HA. > The new interface will use register/delete/resolve APIs instead of > Zookeeper-specific operations like mknode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6413) Decouple Yarn Registry API from ZK
[ https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099345#comment-16099345 ] Jian He commented on YARN-6413: --- bq. I did not remove any methods, I just didn't add the ones that exist in yarn-native-services but not trunk. I thought doing so would probably cause more conflicts than not. looks like RegistryUtils#extractServiceRecords is removed. Do you mind change RegistryDNSServer to use the new methods ? (i.e. make this patch compile with yarn-native-services branch) bq. My understanding was that DNS went through ZK directly without going through the interface, so it wouldn't be affected by the service records setting up the path differently. I can change the path construction back for the ZK impl if it needs that. The DNS today depends on the ZK path layout to reconstruct the service record. So changing the zk path will break DNS. > Decouple Yarn Registry API from ZK > -- > > Key: YARN-6413 > URL: https://issues.apache.org/jira/browse/YARN-6413 > Project: Hadoop YARN > Issue Type: Improvement > Components: amrmproxy, api, resourcemanager >Reporter: Ellen Hui >Assignee: Ellen Hui > Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch > > > Right now the Yarn Registry API (defined in the RegistryOperations interface) > is a very thin layer over Zookeeper. This jira proposes changing the > interface to abstract away the implementation details so that we can write a > FS-based implementation of the registry service, which will be used to > support AMRMProxy HA. > The new interface will use register/delete/resolve APIs instead of > Zookeeper-specific operations like mknode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099341#comment-16099341 ] Konstantinos Karanasos commented on YARN-3409: -- Yeah, I see there are important differences between node labels and attributes. It would be nice to unify them at some point, but I do see that this will require significantly more effort. That said, I think we should indeed not do proposal #2 or #3, as it will be confusing to share protobufs without sharing further functionality... {{add}} or {{set}} are fine. OK with keeping {{replace}}. Sounds a little ambiguous as it does not directly look like the existing attributes on the node will be removed, but we can make this clear in the description of the command. > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6802) Support view leaf queue am resource usage in RM web ui
[ https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099328#comment-16099328 ] YunFan Zhou commented on YARN-6802: --- [~yufeigu] Thank Yufei, I will add Max AM Resource in this JIRA later. > Support view leaf queue am resource usage in RM web ui > -- > > Key: YARN-6802 > URL: https://issues.apache.org/jira/browse/YARN-6802 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: YunFan Zhou >Assignee: YunFan Zhou > Fix For: 2.8.0 > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > YARN-6802.001.patch > > > RM Web ui should support view leaf queue am resource usage. > !screenshot-2.png! > I will upload my patch later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-6862) Nodemanager resource usage metrics sometimes are negative
[ https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YunFan Zhou updated YARN-6862: -- Comment: was deleted (was: [~jlowe] It is very likely that the process is exists, but the resource usage especially the used CPU is very problematic. I think we should fix it.) > Nodemanager resource usage metrics sometimes are negative > - > > Key: YARN-6862 > URL: https://issues.apache.org/jira/browse/YARN-6862 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.2 >Reporter: YunFan Zhou > > When we collect real-time metrics of resource usage in NM, we found those > values sometimes are invalid. > For example, the following are values when collected at some point: > "milliVcoresUsed":-5808, > "currentPmemUsage":-1, > "currentVmemUsage":-1, > "cpuUsagePercentPerCore":-968.1026 > "cpuUsageTotalCoresPercentage":-24.202564, > "pmemLimit":2147483648, > "vmemLimit":4509715456 > There are many negative values, there may a bug in NM. > We should fix it, because the real-time metrics of NM is pretty important for > us sometimes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6862) Nodemanager resource usage metrics sometimes are negative
[ https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099321#comment-16099321 ] YunFan Zhou commented on YARN-6862: --- [~jlowe] It is very likely that the process is exists, but the resource usage especially the used CPU is very problematic. I think we should fix it. > Nodemanager resource usage metrics sometimes are negative > - > > Key: YARN-6862 > URL: https://issues.apache.org/jira/browse/YARN-6862 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.2 >Reporter: YunFan Zhou > > When we collect real-time metrics of resource usage in NM, we found those > values sometimes are invalid. > For example, the following are values when collected at some point: > "milliVcoresUsed":-5808, > "currentPmemUsage":-1, > "currentVmemUsage":-1, > "cpuUsagePercentPerCore":-968.1026 > "cpuUsageTotalCoresPercentage":-24.202564, > "pmemLimit":2147483648, > "vmemLimit":4509715456 > There are many negative values, there may a bug in NM. > We should fix it, because the real-time metrics of NM is pretty important for > us sometimes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6804) Allow custom hostname for docker containers in native services
[ https://issues.apache.org/jira/browse/YARN-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-6804: - Attachment: YARN-6804-trunk.005.patch I believe the attached patch fixes the enforcer error. It excludes the new hadoop-yarn-registry transitive dependency in the hadoop-client-minicluster pom. > Allow custom hostname for docker containers in native services > -- > > Key: YARN-6804 > URL: https://issues.apache.org/jira/browse/YARN-6804 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Fix For: yarn-native-services > > Attachments: YARN-6804-trunk.004.patch, YARN-6804-trunk.005.patch, > YARN-6804-yarn-native-services.001.patch, > YARN-6804-yarn-native-services.002.patch, > YARN-6804-yarn-native-services.003.patch, > YARN-6804-yarn-native-services.004.patch, > YARN-6804-yarn-native-services.005.patch > > > Instead of the default random docker container hostname, we could set a more > user-friendly hostname for the container. The default could be a hostname > based on the container ID, with an option for the AM to provide a different > hostname. In the case of the native services AM, we could provide the > hostname that would be created by the registry DNS server. Regardless of > whether or not registry DNS is enabled, this would be a more useful hostname > for the docker container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099292#comment-16099292 ] Wangda Tan commented on YARN-3409: -- [~kkaranasos], Add my thoughts to your questions, [~naganarasimha...@apache.org] please add yours if you think differently. bq. It was not clear to me how the newly added node attributes are going to play with existing node labels. Is the plan to share some code or will it be completely separate? There're still many differences between partition and attribute, for example, we don't need queue-acl for node-attribute, and after revisit existing implementation, we may not need to support multiple NM launched on the same host with different ports as well. It may share some basic implementations (like node-label-manager), however for the API level it might be better to have a separate node-attribute-protocol, since adding them to NodeLabelProto looks too complex. The main part of "2. API proto changes" should be read as proposal#1, and "Alternate Proposal 1"/"Alternate Proposal 2" should be read as proposal #2 and #3. bq. Re: how applications will be specifying node attribute constraints. I think so, right? +Naga. bq. In the CLI API the replace and update seem a bit confusing to me ... Regarding to semantics of node attribute CLI, I think we all agree with {{update}} (adding new constraints or replacing the value of existing ones). Instead of calling it {{update}} or {{set}}, how about call it {{add}} (which overwrite value if key presents)? I suggest to keep {{replace}} as a single operation, since it is a superset of {{remove}} and {{remove+add}}, which we can provide atomic op as well. Also, I think it might be more frequently used by end users instead of a plain {{remove}} (what is the scenario we need to clean up all attribute on a node?). {{node1:att1=val1}} looks better than {{node1=att1:val1}}. > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099277#comment-16099277 ] Konstantinos Karanasos commented on YARN-3409: -- Hi guys, Nice to see you are resuming work on this. I just checked the latest design document. Here are a couple of questions: * It was not clear to me how the newly added node attributes are going to play with existing node labels. Is the plan to share some code or will it be completely separate? I feel that there should be some unification. Not sure I understand the two alternatives you mention ("alternate proposal 1 & 2") compared to the solution you are proposing instead. * Re: how applications will be specifying node attribute constraints, is the plan to use the new constraints API we are introducing in YARN-6593? * In the CLI API the replace and update seem a bit confusing to me. Update is essentially adding new constraints or replacing the value of existing ones -- we could also call it set (and even have an extra parameter that determines if we override). Replace is about removing all existing ones and then adding new -- we could do it in two steps maybe? Also, I think it's more intuitive to do "node1:att1=val1" instead of "node1=att1:val1". > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6733) Add table for storing sub-application entities
[ https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099207#comment-16099207 ] Vrushali C edited comment on YARN-6733 at 7/24/17 11:28 PM: So we thought that it will be good to keep the column name so that sub apps can store this information. For regular applications, the flow version can be used to determine whether optimizations are to be done. The flow version indicates if the flow has changed, that is, say if the pig script changes, its flow version will change. So then, for example, reducer estimation calculations can be done differently. This applies to the application entities. We discussed that it will be good to keep the same information for sub-apps in case they want to use this information in a similar fashion. As such, this column currently only exists in code, it's not taking up any disk space/hbase space etc if no one writes to it. But having it gives the framework developers a chance to use it if they want. was (Author: vrushalic): So we thought that it will be good to keep the column name so that sub apps can store this information. For regular applications, the flow version can be used to determine whether optimizations are to be done. The flow version indicates if the flow has changed, that is, say if the pig script changes, it's flow version will change. So then, for example, reducer estimation calculations can be done differently. This applies to the application entities. We discussed that it will be good to keep the same information for sub-apps in case they want to use this information in a similar fashion. As such, this column currently only exists in code, it's not taking up any disk space/hbase space etc if no one writes to it. But having it given the framework developers a chance to use it if they want. > Add table for storing sub-application entities > -- > > Key: YARN-6733 > URL: https://issues.apache.org/jira/browse/YARN-6733 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: IMG_7040.JPG, YARN-6733-YARN-5355.001.patch, > YARN-6733-YARN-5355.002.patch, YARN-6733-YARN-5355.003.patch, > YARN-6733-YARN-5355.004.patch, YARN-6733-YARN-5355.005.patch, > YARN-6733-YARN-5355.006.patch, YARN-6733-YARN-5355.007.patch, > YARN-6733-YARN-5355.008.patch > > > After a discussion with Tez folks, we have been thinking over introducing a > table to store sub-application information. > For example, if a Tez session runs for a certain period as User X and runs a > few AMs. These AMs accept DAGs from other users. Tez will execute these dags > with a doAs user. ATSv2 should store this information in a new table perhaps > called as "sub_application" table. > This jira tracks the code changes needed for table schema creation. > I will file other jiras for writing to that table, updating the user name > fields to include sub-application user etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6610) DominantResourceCalculator.getResourceAsValue() dominant param is no longer appropriate
[ https://issues.apache.org/jira/browse/YARN-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099257#comment-16099257 ] Wangda Tan commented on YARN-6610: -- Thanks [~templedf] for working on the patch and comments from [~sunilg]/[~yufeigu]. Tried to review the patch, it is already outdated, I will do a detailed review once the patch is updated. For one thing definitely need to be updated: Existing impl creates and uses TreeSet for every compare operation, which could be very slow according to testing of YARN-6788 (we expect another patch to completely remove map operations in the code path of Resource). I suggest to relook at the patch once we fill major performance gaps of YARN-3926 branch. > DominantResourceCalculator.getResourceAsValue() dominant param is no longer > appropriate > --- > > Key: YARN-6610 > URL: https://issues.apache.org/jira/browse/YARN-6610 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-6610.001.patch > > > The {{dominant}} param assumes there are only two resources, i.e. true means > to compare the dominant, and false means to compare the subordinate. Now > that there are _n_ resources, this parameter no longer makes sense. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6726) Fix issues with docker commands executed by container-executor
[ https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099233#comment-16099233 ] Wangda Tan commented on YARN-6726: -- Thanks [~shaneku...@gmail.com] for the patch. Discussed with Shane offline, in general the approach looks good, I haven't done detailed reviews of code yet. Few comments/questions: 1) Could we do more strict container_id checking, checking string starts with container_ might not be enough? Probably you can check the method (validate_container_id) I added to YARN-6852. Which we can avoid less malicious kill container, etc. 2) {{LOGFILE flush}}, I'm not quite sure about this item, could you elaborate? 3) Regarding to comment from [~chris.douglas], bq. We also need to prevent the yarn user from becoming root ... If we can limit docker command only apply to containers launched by YARN (which we can use strict container_id pattern matching to identify that), it should be already much better than what we have today. We can implement other options such as enable/disable component, dynamic load libraries, etc. along with YARN-5673. > Fix issues with docker commands executed by container-executor > -- > > Key: YARN-6726 > URL: https://issues.apache.org/jira/browse/YARN-6726 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-6726.001.patch > > > docker inspect, rm, stop, etc are issued through container-executor. Commands > other than docker run are not functioning properly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6866) Minor clean-up and fixes in anticipation of merge with trunk
Subru Krishnan created YARN-6866: Summary: Minor clean-up and fixes in anticipation of merge with trunk Key: YARN-6866 URL: https://issues.apache.org/jira/browse/YARN-6866 Project: Hadoop YARN Issue Type: Sub-task Components: federation Reporter: Subru Krishnan Assignee: Botong Huang We have done e2e testing of YARN Federation sucessfully and we have minor cleans-up like pom version upgrade, redudant "." in configuration string, documentation updates etc which we want to clean up before the merge to trunk. This jira tracks the fixes we did as described above to ensure proper e2e run. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6733) Add table for storing sub-application entities
[ https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099207#comment-16099207 ] Vrushali C commented on YARN-6733: -- So we thought that it will be good to keep the column name so that sub apps can store this information. For regular applications, the flow version can be used to determine whether optimizations are to be done. The flow version indicates if the flow has changed, that is, say if the pig script changes, it's flow version will change. So then, for example, reducer estimation calculations can be done differently. This applies to the application entities. We discussed that it will be good to keep the same information for sub-apps in case they want to use this information in a similar fashion. As such, this column currently only exists in code, it's not taking up any disk space/hbase space etc if no one writes to it. But having it given the framework developers a chance to use it if they want. > Add table for storing sub-application entities > -- > > Key: YARN-6733 > URL: https://issues.apache.org/jira/browse/YARN-6733 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: IMG_7040.JPG, YARN-6733-YARN-5355.001.patch, > YARN-6733-YARN-5355.002.patch, YARN-6733-YARN-5355.003.patch, > YARN-6733-YARN-5355.004.patch, YARN-6733-YARN-5355.005.patch, > YARN-6733-YARN-5355.006.patch, YARN-6733-YARN-5355.007.patch, > YARN-6733-YARN-5355.008.patch > > > After a discussion with Tez folks, we have been thinking over introducing a > table to store sub-application information. > For example, if a Tez session runs for a certain period as User X and runs a > few AMs. These AMs accept DAGs from other users. Tez will execute these dags > with a doAs user. ATSv2 should store this information in a new table perhaps > called as "sub_application" table. > This jira tracks the code changes needed for table schema creation. > I will file other jiras for writing to that table, updating the user name > fields to include sub-application user etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare
[ https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099195#comment-16099195 ] Yufei Gu commented on YARN-6307: The overhead of this refactoring would be 1. More check on variable {{res}}, at most 4 more checking which possible need 4-8 more CPU instructions, which is fine. 2. Overhead of function invoking. JVM does inline methods. https://stackoverflow.com/questions/2096361/are-there-inline-functions-in-java Moreover, this refactor will reduce the computation of fair share comparison, since it does fair share ratio computation anyway before my patch while it is invoked only if necessary after my patch. In that sense, I don't think we need to worry about performance. > Refactor FairShareComparator#compare > > > Key: YARN-6307 > URL: https://issues.apache.org/jira/browse/YARN-6307 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6307.001.patch, YARN-6307.002.patch, > YARN-6307.003.patch > > > The method does three things: compare the min share usage, compare fair share > usage by checking weight ratio, break tied by submit time and name. They are > mixed with each other which is not easy to read and maintenance, poor style. > Additionally, there are potential performance issues, like no need to check > weight ratio if minShare usage comparison already indicate the order. It is > worth to improve considering huge amount invokings in scheduler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099193#comment-16099193 ] Wangda Tan commented on YARN-6852: -- [~chris.douglas], could you help to review the approach and patch if you have bandwidth? > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099180#comment-16099180 ] Jian He edited comment on YARN-6031 at 7/24/17 10:15 PM: - Ran into this patch when debugging same issue, got few questions: cc [~sunilg], [~Ying Zhang] 1. Below code catches InvalidLabelResourceRequestException and assumes that the error is because node-label becomes disabled, but the same InvalidLabelResourceRequestException can be thrown for other reasons too, right ? in that case, the following logic becomes invalid. {code} amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery); } catch (InvalidLabelResourceRequestException e) { // This can happen if the application had been submitted and run // with Node Label enabled but recover with Node Label disabled. // Thus there might be node label expression in the application's // resource requests. If this is the case, create RmAppImpl with // null amReq and reject the application later with clear error // message. So that the application can still be tracked by RM // after recovery and user can see what's going on and react accordingly. if (isRecovery && !YarnConfiguration.areNodeLabelsEnabled(this.conf)) { if (LOG.isDebugEnabled()) { LOG.debug("AMResourceRequest is not created for " + applicationId + ". NodeLabel is not enabled in cluster, but AM resource " + "request contains a label expression."); } } else { throw e; } {code} 2. Below code directly transitions app to failed by using a Rejected event. The attempt state is not moved to failed, it'll be stuck there ? I think we need to send KILL event instead of REJECT event {code} if (labelExp != null && !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) { String message = "Failed to recover application " + appId + ". NodeLabel is not enabled in cluster, but AM resource request " + "contains a label expression."; LOG.warn(message); application.handle( new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message)); return; } {code} 3. Is it ok to let the app continue in this scenario, it's less disruptive to the apps. What's the disadvantage if we let app continue ? was (Author: jianhe): Ran into this patch when debugging, got few questions: cc [~sunilg], [~Ying Zhang] 1. Below code catches InvalidLabelResourceRequestException and assumes that the error is because node-label becomes disabled, but the same InvalidLabelResourceRequestException can be thrown for other reasons too, right ? in that case, the following logic becomes invalid. {code} amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery); } catch (InvalidLabelResourceRequestException e) { // This can happen if the application had been submitted and run // with Node Label enabled but recover with Node Label disabled. // Thus there might be node label expression in the application's // resource requests. If this is the case, create RmAppImpl with // null amReq and reject the application later with clear error // message. So that the application can still be tracked by RM // after recovery and user can see what's going on and react accordingly. if (isRecovery && !YarnConfiguration.areNodeLabelsEnabled(this.conf)) { if (LOG.isDebugEnabled()) { LOG.debug("AMResourceRequest is not created for " + applicationId + ". NodeLabel is not enabled in cluster, but AM resource " + "request contains a label expression."); } } else { throw e; } {code} 2. Below code directly transitions app to failed by using a Rejected event. The attempt state is not moved to failed, it'll be stuck there ? I think we need to send KILL event instead of REJECT event {code} if (labelExp != null && !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) { String message = "Failed to recover application " + appId + ". NodeLabel is not enabled in cluster, but AM resource request " + "contains a label expression."; LOG.warn(message); application.handle( new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message)); return; } {code} 3. Is it ok to let the app continue in this scenario, it's less disruptive to the apps. What's the disadvantage if we let app continue ? > Application recovery has failed when node label feature is turned off during > RM recovery > > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project:
[jira] [Comment Edited] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099180#comment-16099180 ] Jian He edited comment on YARN-6031 at 7/24/17 10:08 PM: - Ran into this patch when debugging, got few questions: cc [~sunilg], [~Ying Zhang] 1. Below code catches InvalidLabelResourceRequestException and assumes that the error is because node-label becomes disabled, but the same InvalidLabelResourceRequestException can be thrown for other reasons too, right ? in that case, the following logic becomes invalid. {code} amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery); } catch (InvalidLabelResourceRequestException e) { // This can happen if the application had been submitted and run // with Node Label enabled but recover with Node Label disabled. // Thus there might be node label expression in the application's // resource requests. If this is the case, create RmAppImpl with // null amReq and reject the application later with clear error // message. So that the application can still be tracked by RM // after recovery and user can see what's going on and react accordingly. if (isRecovery && !YarnConfiguration.areNodeLabelsEnabled(this.conf)) { if (LOG.isDebugEnabled()) { LOG.debug("AMResourceRequest is not created for " + applicationId + ". NodeLabel is not enabled in cluster, but AM resource " + "request contains a label expression."); } } else { throw e; } {code} 2. Below code directly transitions app to failed by using a Rejected event. The attempt state is not moved to failed, it'll be stuck there ? I think we need to send KILL event instead of REJECT event {code} if (labelExp != null && !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) { String message = "Failed to recover application " + appId + ". NodeLabel is not enabled in cluster, but AM resource request " + "contains a label expression."; LOG.warn(message); application.handle( new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message)); return; } {code} 3. Is it ok to let the app continue in this scenario, it's less disruptive to the apps. What's the disadvantage if we let app continue ? was (Author: jianhe): Ran into this patch when debugging, got few questions: cc [~sunilg], [~Ying Zhang] 1. Below code catches InvalidLabelResourceRequestException and assumes that the error is because node-label becomes disabled, but the same InvalidLabelResourceRequestException can be thrown for other reasons too, right ? in that case, the following logic becomes invalid. {code} amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery); } catch (InvalidLabelResourceRequestException e) { // This can happen if the application had been submitted and run // with Node Label enabled but recover with Node Label disabled. // Thus there might be node label expression in the application's // resource requests. If this is the case, create RmAppImpl with // null amReq and reject the application later with clear error // message. So that the application can still be tracked by RM // after recovery and user can see what's going on and react accordingly. if (isRecovery && !YarnConfiguration.areNodeLabelsEnabled(this.conf)) { if (LOG.isDebugEnabled()) { LOG.debug("AMResourceRequest is not created for " + applicationId + ". NodeLabel is not enabled in cluster, but AM resource " + "request contains a label expression."); } } else { throw e; } {code} 2. Below code directly transitions app to failed by using a Rejected event. The attempt state is not moved to failed, it'll be stuck there ? {code} if (labelExp != null && !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) { String message = "Failed to recover application " + appId + ". NodeLabel is not enabled in cluster, but AM resource request " + "contains a label expression."; LOG.warn(message); application.handle( new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message)); return; } {code} 3. Is it ok to let the app continue in this scenario, it's less disruptive to the apps. What's the disadvantage if we let app continue ? > Application recovery has failed when node label feature is turned off during > RM recovery > > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler
[jira] [Commented] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099180#comment-16099180 ] Jian He commented on YARN-6031: --- Ran into this patch when debugging, got few questions: cc [~sunilg], [~Ying Zhang] 1. Below code catches InvalidLabelResourceRequestException and assumes that the error is because node-label becomes disabled, but the same InvalidLabelResourceRequestException can be thrown for other reasons too, right ? in that case, the following logic becomes invalid. {code} amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery); } catch (InvalidLabelResourceRequestException e) { // This can happen if the application had been submitted and run // with Node Label enabled but recover with Node Label disabled. // Thus there might be node label expression in the application's // resource requests. If this is the case, create RmAppImpl with // null amReq and reject the application later with clear error // message. So that the application can still be tracked by RM // after recovery and user can see what's going on and react accordingly. if (isRecovery && !YarnConfiguration.areNodeLabelsEnabled(this.conf)) { if (LOG.isDebugEnabled()) { LOG.debug("AMResourceRequest is not created for " + applicationId + ". NodeLabel is not enabled in cluster, but AM resource " + "request contains a label expression."); } } else { throw e; } {code} 2. Below code directly transitions app to failed by using a Rejected event. The attempt state is not moved to failed, it'll be stuck there ? {code} if (labelExp != null && !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) { String message = "Failed to recover application " + appId + ". NodeLabel is not enabled in cluster, but AM resource request " + "contains a label expression."; LOG.warn(message); application.handle( new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message)); return; } {code} 3. Is it ok to let the app continue in this scenario, it's less disruptive to the apps. What's the disadvantage if we let app continue ? > Application recovery has failed when node label feature is turned off during > RM recovery > > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, > YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands,
[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare
[ https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099175#comment-16099175 ] Daniel Templeton commented on YARN-6307: LGTM. Given the frequency with which this method is called, any performance concerns about unrolling the nested _if_ statements? My hunch is that the compiler and/or JIT will make it ultimately irrelevant, but I didn't find anything conclusive online. > Refactor FairShareComparator#compare > > > Key: YARN-6307 > URL: https://issues.apache.org/jira/browse/YARN-6307 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6307.001.patch, YARN-6307.002.patch, > YARN-6307.003.patch > > > The method does three things: compare the min share usage, compare fair share > usage by checking weight ratio, break tied by submit time and name. They are > mixed with each other which is not easy to read and maintenance, poor style. > Additionally, there are potential performance issues, like no need to check > weight ratio if minShare usage comparison already indicate the order. It is > worth to improve considering huge amount invokings in scheduler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare
[ https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099164#comment-16099164 ] Yufei Gu commented on YARN-6307: Thanks [~templedf] for the review. Uploaded patch v3 for your comments. > Refactor FairShareComparator#compare > > > Key: YARN-6307 > URL: https://issues.apache.org/jira/browse/YARN-6307 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6307.001.patch, YARN-6307.002.patch, > YARN-6307.003.patch > > > The method does three things: compare the min share usage, compare fair share > usage by checking weight ratio, break tied by submit time and name. They are > mixed with each other which is not easy to read and maintenance, poor style. > Additionally, there are potential performance issues, like no need to check > weight ratio if minShare usage comparison already indicate the order. It is > worth to improve considering huge amount invokings in scheduler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6307) Refactor FairShareComparator#compare
[ https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6307: --- Attachment: YARN-6307.003.patch > Refactor FairShareComparator#compare > > > Key: YARN-6307 > URL: https://issues.apache.org/jira/browse/YARN-6307 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6307.001.patch, YARN-6307.002.patch, > YARN-6307.003.patch > > > The method does three things: compare the min share usage, compare fair share > usage by checking weight ratio, break tied by submit time and name. They are > mixed with each other which is not easy to read and maintenance, poor style. > Additionally, there are potential performance issues, like no need to check > weight ratio if minShare usage comparison already indicate the order. It is > worth to improve considering huge amount invokings in scheduler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6610) DominantResourceCalculator.getResourceAsValue() dominant param is no longer appropriate
[ https://issues.apache.org/jira/browse/YARN-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099146#comment-16099146 ] Yufei Gu commented on YARN-6610: Thanks [~templedf] for working on this. Some thoughts: # I like what you did in method getResourceAsDominantValue(). # The parameter "boolean singleType" is not honored in method {{compare(Resource clusterResource, Resource lhs, Resource rhs, boolean singleType)}}. # It may easily to go to this branch with multiple resources, say, only few nodes have one type of resource and they went down, which causes resource comparison less meaningful since {{compare(Resource lhs, Resource rhs)}} could easily return 0. I assume we could use the similar algorithm in {{compare(Resource clusterResource, Resource lhs, Resource rhs, boolean singleType)}}. {code} if (isInvalidDivisor(clusterResource)) { return this.compare(lhs, rhs); } {code} # Do you mind add unit tests for {{compare(Resource clusterResource, Resource lhs, Resource rhs, boolean singleType)}}? # extra space in line "resource. The share ..." > DominantResourceCalculator.getResourceAsValue() dominant param is no longer > appropriate > --- > > Key: YARN-6610 > URL: https://issues.apache.org/jira/browse/YARN-6610 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-6610.001.patch > > > The {{dominant}} param assumes there are only two resources, i.e. true means > to compare the dominant, and false means to compare the subordinate. Now > that there are _n_ resources, this parameter no longer makes sense. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099137#comment-16099137 ] Wangda Tan commented on YARN-3409: -- Thanks [~Naganarasimha] and inputs from [~sunilg]. The latest API design looks good, for bq. 2. API proto changes. I personally prefer adding new {{NodeAttributeProto}} and {{NodeToAttributeProto}} instead of changing existing {{NodeLabelProto}}. Also, should {{NodeIdToAttributesProto}} actually be {{NodeToAttributesProto}} since we don't want to support different ports on the same NM host, correct? > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare
[ https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099134#comment-16099134 ] Daniel Templeton commented on YARN-6307: Nice patch, [~yufeigu]. Here are my comments: # Since we're unnesting all the _if_ blocks, let's take do that here, too: {code} if (res == 0) { // Apps are tied in fairness ratio. Break the tie by submit time and job // name to get a deterministic ordering, which is useful for unit tests. res = (int) Math.signum(s1.getStartTime() - s2.getStartTime()); if (res == 0) { res = s1.getName().compareTo(s2.getName()); } }{code} # Let's not stack the declarations: {code} double useToWeightRatio1, useToWeightRatio2;{code} Otherwise, looks good. > Refactor FairShareComparator#compare > > > Key: YARN-6307 > URL: https://issues.apache.org/jira/browse/YARN-6307 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6307.001.patch, YARN-6307.002.patch > > > The method does three things: compare the min share usage, compare fair share > usage by checking weight ratio, break tied by submit time and name. They are > mixed with each other which is not easy to read and maintenance, poor style. > Additionally, there are potential performance issues, like no need to check > weight ratio if minShare usage comparison already indicate the order. It is > worth to improve considering huge amount invokings in scheduler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6802) Support view leaf queue am resource usage in RM web ui
[ https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099121#comment-16099121 ] Yufei Gu commented on YARN-6802: Thank [~daemon] for working on this. It is definitely useful. I filed YARN-6468 (Add Max AM Resource and AM Resource Usage to FairScheduler WebUI) several months ago, which is similar to this JIRA. Do you mind add Max AM Resource in this JIRA and close YARN-6468 as the duplicated one? > Support view leaf queue am resource usage in RM web ui > -- > > Key: YARN-6802 > URL: https://issues.apache.org/jira/browse/YARN-6802 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: YunFan Zhou >Assignee: YunFan Zhou > Fix For: 2.8.0 > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > YARN-6802.001.patch > > > RM Web ui should support view leaf queue am resource usage. > !screenshot-2.png! > I will upload my patch later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6865) FSLeafQueue.context should be final
Daniel Templeton created YARN-6865: -- Summary: FSLeafQueue.context should be final Key: YARN-6865 URL: https://issues.apache.org/jira/browse/YARN-6865 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 3.0.0-alpha4 Reporter: Daniel Templeton Assignee: Laura Torres Priority: Trivial -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6864) FSPreemptionThread cleanup for readability
[ https://issues.apache.org/jira/browse/YARN-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6864: --- Issue Type: Improvement (was: Bug) > FSPreemptionThread cleanup for readability > -- > > Key: YARN-6864 > URL: https://issues.apache.org/jira/browse/YARN-6864 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Minor > Attachments: YARN-6864.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6845) Variable scheduler of FSLeafQueue duplicates the one of its parent FSQueue.
[ https://issues.apache.org/jira/browse/YARN-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099092#comment-16099092 ] Daniel Templeton commented on YARN-6845: LGTM +1 > Variable scheduler of FSLeafQueue duplicates the one of its parent FSQueue. > --- > > Key: YARN-6845 > URL: https://issues.apache.org/jira/browse/YARN-6845 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu >Priority: Trivial > Attachments: YARN-6845.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6864) FSPreemptionThread cleanup for readability
[ https://issues.apache.org/jira/browse/YARN-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6864: --- Attachment: YARN-6864.001.patch > FSPreemptionThread cleanup for readability > -- > > Key: YARN-6864 > URL: https://issues.apache.org/jira/browse/YARN-6864 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Minor > Attachments: YARN-6864.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6864) FSPreemptionThread cleanup for readability
Daniel Templeton created YARN-6864: -- Summary: FSPreemptionThread cleanup for readability Key: YARN-6864 URL: https://issues.apache.org/jira/browse/YARN-6864 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 3.0.0-alpha4 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6863) Fair Scheduler preemption thread should check that a container has the needed resources before adding it to the preemption list
[ https://issues.apache.org/jira/browse/YARN-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6863: --- Description: In {{FSPreemptionThread.identifyContainersToPreemptOnNode()}}, we add every container we encounter to the preemption list until we meet the desired target. As we head into resource types, that behavior will become more of a problem, but it's technically an issue already because the fair scheduler supports requests with 0 vcores or 0 memory. (was: In {{FSPreemptionThread.identifyContainersToPreempt()}}, we add every container we encounter to the preemption list until we meet the desired target. As we head into resource types, that behavior will become more of a problem, but it's technically an issue already because the fair scheduler supports requests with 0 vcores or 0 memory.) > Fair Scheduler preemption thread should check that a container has the needed > resources before adding it to the preemption list > --- > > Key: YARN-6863 > URL: https://issues.apache.org/jira/browse/YARN-6863 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Priority: Minor > > In {{FSPreemptionThread.identifyContainersToPreemptOnNode()}}, we add every > container we encounter to the preemption list until we meet the desired > target. As we head into resource types, that behavior will become more of a > problem, but it's technically an issue already because the fair scheduler > supports requests with 0 vcores or 0 memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6863) Fair Scheduler preemption thread should check that a container has the needed resources before adding it to the preemption list
Daniel Templeton created YARN-6863: -- Summary: Fair Scheduler preemption thread should check that a container has the needed resources before adding it to the preemption list Key: YARN-6863 URL: https://issues.apache.org/jira/browse/YARN-6863 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 3.0.0-alpha4 Reporter: Daniel Templeton Priority: Minor In {{FSPreemptionThread.identifyContainersToPreempt()}}, we add every container we encounter to the preemption list until we meet the desired target. As we head into resource types, that behavior will become more of a problem, but it's technically an issue already because the fair scheduler supports requests with 0 vcores or 0 memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6788) Improve performance of resource profile branch
[ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099045#comment-16099045 ] Hadoop QA commented on YARN-6788: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} YARN-3926 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 44s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 41s{color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 30s{color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 51s{color} | {color:green} YARN-3926 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 59s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in YARN-3926 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 31s{color} | {color:green} YARN-3926 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 13 new + 125 unchanged - 17 fixed = 138 total (was 142) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 5s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api generated 3 new + 0 unchanged - 1 fixed = 3 total (was 1) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 25s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 13s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 0s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 28s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api | | | org.apache.hadoop.yarn.api.records.impl.BaseResource.getResources() may expose internal
[jira] [Updated] (YARN-5146) [YARN-3368] Supports Fair Scheduler in new YARN UI
[ https://issues.apache.org/jira/browse/YARN-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdullah Yousufi updated YARN-5146: --- Attachment: YARN-5146.004.patch > [YARN-3368] Supports Fair Scheduler in new YARN UI > -- > > Key: YARN-5146 > URL: https://issues.apache.org/jira/browse/YARN-5146 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Abdullah Yousufi > Attachments: YARN-5146.001.patch, YARN-5146.002.patch, > YARN-5146.003.patch, YARN-5146.004.patch > > > Current implementation in branch YARN-3368 only support capacity scheduler, > we want to make it support fair scheduler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6779) DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() should be @VisibleForTesting
[ https://issues.apache.org/jira/browse/YARN-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton reassigned YARN-6779: -- Assignee: Yeliang Cang (was: Laura Torres) > DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() > should be @VisibleForTesting > > > Key: YARN-6779 > URL: https://issues.apache.org/jira/browse/YARN-6779 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.1, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Yeliang Cang >Priority: Trivial > Labels: newbie > Attachments: YARN-6779-001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6768) Improve performance of yarn api record toString and fromString
[ https://issues.apache.org/jira/browse/YARN-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098979#comment-16098979 ] Hudson commented on YARN-6768: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12049 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12049/]) YARN-6768. Improve performance of yarn api record toString and (jlowe: rev 24853bf32a045b8f029fb136edca2af03836c8d5) * (add) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestFastNumberFormat.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ReservationId.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/FastNumberFormat.java > Improve performance of yarn api record toString and fromString > -- > > Key: YARN-6768 > URL: https://issues.apache.org/jira/browse/YARN-6768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 2.9.0, 3.0.0-beta1, 2.8.2 > > Attachments: YARN-6768.1.patch, YARN-6768.2.patch, YARN-6768.3.patch, > YARN-6768.4.patch, YARN-6768.5.patch, YARN-6768.6.patch, YARN-6768.7.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail
[ https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098959#comment-16098959 ] Suma Shivaprasad commented on YARN-5219: [~sunilg] Yes I meant "set -o pipefail -e" . Sorry about the typo earlier. > When an export var command fails in launch_container.sh, the full container > launch should fail > -- > > Key: YARN-5219 > URL: https://issues.apache.org/jira/browse/YARN-5219 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Hitesh Shah >Assignee: Sunil G > Attachments: YARN-5219.001.patch, YARN-5219.003.patch, > YARN-5219.004.patch, YARN-5219.005.patch, YARN-5219.006.patch, > YARN-5219-branch-2.001.patch > > > Today, a container fails if certain files fail to localize. However, if > certain env vars fail to get setup properly either due to bugs in the yarn > application or misconfiguration, the actual process launch still gets > triggered. This results in either confusing error messages if the process > fails to launch or worse yet the process launches but then starts behaving > wrongly if the env var is used to control some behavioral aspects. > In this scenario, the issue was reproduced by trying to do export > abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey
[ https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098922#comment-16098922 ] Ray Chiang commented on YARN-6150: -- +1 Thanks [~ajisakaa] for digging into this. > TestContainerManagerSecurity tests for Yarn Server are flakey > - > > Key: YARN-6150 > URL: https://issues.apache.org/jira/browse/YARN-6150 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Daniel Sturman >Assignee: Daniel Sturman > Attachments: YARN-6150.001.patch, YARN-6150.002.patch, > YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, > YARN-6150.006.patch, YARN-6150.007.patch > > > Repeated runs of > {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either > pass or fail on repeated runs on the same codebase. Also, the two runs (one > in secure mode, one without security) aren't well labeled in JUnit. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6788) Improve performance of resource profile branch
[ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-6788: -- Attachment: YARN-6788-YARN-3926.012.patch As discussed earlier, I will be suppressing 3 findbugs warnings which are related to expose internal representation. But in fact, i return a read only array. Doing a copy on every getter call had significant impact on performance. > Improve performance of resource profile branch > -- > > Key: YARN-6788 > URL: https://issues.apache.org/jira/browse/YARN-6788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Sunil G >Assignee: Sunil G >Priority: Blocker > Attachments: YARN-6788-YARN-3926.001.patch, > YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, > YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, > YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, > YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, > YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, > YARN-6788-YARN-3926.012.patch > > > Currently we could see a 15% performance delta with this branch. > Few performance improvements to improve the same. > Also this patch will handle > [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418] > from [~leftnoteasy]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6413) Decouple Yarn Registry API from ZK
[ https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098902#comment-16098902 ] Ellen Hui commented on YARN-6413: - bq. In RegistryDNSServer, all the methods removed are essential to registryDNS - the implementation of RegistryDNS is depending on the zookeeper’s listener functionality. This is one big difference from the state store implementation. we can not remove those methods. I did not remove any methods, I just didn't add the ones that exist in yarn-native-services but not trunk. I thought doing so would probably cause more conflicts than not. bq. The path is used by RegistryDNSServer for reconstructing the DNS record(e.g. BaseServiceRecordProcessor#getContainerName). If we change the path, everything there will break. Also, the registry documentation needs to change. My understanding was that DNS went through ZK directly without going through the interface, so it wouldn't be affected by the service records setting up the path differently. I can change the path construction back for the ZK impl if it needs that. > Decouple Yarn Registry API from ZK > -- > > Key: YARN-6413 > URL: https://issues.apache.org/jira/browse/YARN-6413 > Project: Hadoop YARN > Issue Type: Improvement > Components: amrmproxy, api, resourcemanager >Reporter: Ellen Hui >Assignee: Ellen Hui > Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch > > > Right now the Yarn Registry API (defined in the RegistryOperations interface) > is a very thin layer over Zookeeper. This jira proposes changing the > interface to abstract away the implementation details so that we can write a > FS-based implementation of the registry service, which will be used to > support AMRMProxy HA. > The new interface will use register/delete/resolve APIs instead of > Zookeeper-specific operations like mknode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5548) Use MockRMMemoryStateStore to reduce test failures
[ https://issues.apache.org/jira/browse/YARN-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098784#comment-16098784 ] Hadoop QA commented on YARN-5548: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 18 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 11s{color} | {color:orange} root: The patch generated 7 new + 963 unchanged - 8 fixed = 970 total (was 971) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 56s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 45s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 6s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-5548 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12878633/YARN-5548.0016.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ecbc46fe0d0c 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 770cc46 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16532/artifact/patchprocess/diff-checkstyle-root.txt | | unit |
[jira] [Commented] (YARN-6862) Nodemanager resource usage metrics sometimes are negative
[ https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098634#comment-16098634 ] YunFan Zhou commented on YARN-6862: --- [~jlowe] It is very likely that the process is exists, but the resource usage especially the used CPU is very problematic. I think we should fix it. > Nodemanager resource usage metrics sometimes are negative > - > > Key: YARN-6862 > URL: https://issues.apache.org/jira/browse/YARN-6862 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.2 >Reporter: YunFan Zhou > > When we collect real-time metrics of resource usage in NM, we found those > values sometimes are invalid. > For example, the following are values when collected at some point: > "milliVcoresUsed":-5808, > "currentPmemUsage":-1, > "currentVmemUsage":-1, > "cpuUsagePercentPerCore":-968.1026 > "cpuUsageTotalCoresPercentage":-24.202564, > "pmemLimit":2147483648, > "vmemLimit":4509715456 > There are many negative values, there may a bug in NM. > We should fix it, because the real-time metrics of NM is pretty important for > us sometimes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6862) Nodemanager resource usage metrics sometimes are negative
[ https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098609#comment-16098609 ] Jason Lowe commented on YARN-6862: -- I believe the case of it returning -1B is when the process exited just as the resource monitor was going to examine it. It's an invalid result because there is no process there. We should not be aggregating those results if that's indeed the case. > Nodemanager resource usage metrics sometimes are negative > - > > Key: YARN-6862 > URL: https://issues.apache.org/jira/browse/YARN-6862 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.2 >Reporter: YunFan Zhou > > When we collect real-time metrics of resource usage in NM, we found those > values sometimes are invalid. > For example, the following are values when collected at some point: > "milliVcoresUsed":-5808, > "currentPmemUsage":-1, > "currentVmemUsage":-1, > "cpuUsagePercentPerCore":-968.1026 > "cpuUsageTotalCoresPercentage":-24.202564, > "pmemLimit":2147483648, > "vmemLimit":4509715456 > There are many negative values, there may a bug in NM. > We should fix it, because the real-time metrics of NM is pretty important for > us sometimes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6862) Nodemanager resource usage metrics sometimes are negative
[ https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098603#comment-16098603 ] YunFan Zhou commented on YARN-6862: --- [~sunilg] Thanks. We only can see used memory from NM logs, and from NM logs we can see some logs as follows: 2017-07-24 22:19:08,551 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 23933 for container-id container_e6717_1500903083707_0014_01_000259: -1B of 1 GB physical memory used; -1B of 2.1 GB virtual memory used Because we collect resource usage metrics direct from MonitoringThread#run method, so the metrics is very reliable. > Nodemanager resource usage metrics sometimes are negative > - > > Key: YARN-6862 > URL: https://issues.apache.org/jira/browse/YARN-6862 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.2 >Reporter: YunFan Zhou > > When we collect real-time metrics of resource usage in NM, we found those > values sometimes are invalid. > For example, the following are values when collected at some point: > "milliVcoresUsed":-5808, > "currentPmemUsage":-1, > "currentVmemUsage":-1, > "cpuUsagePercentPerCore":-968.1026 > "cpuUsageTotalCoresPercentage":-24.202564, > "pmemLimit":2147483648, > "vmemLimit":4509715456 > There are many negative values, there may a bug in NM. > We should fix it, because the real-time metrics of NM is pretty important for > us sometimes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6842) Implement a new access type for queue
[ https://issues.apache.org/jira/browse/YARN-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098594#comment-16098594 ] YunFan Zhou commented on YARN-6842: --- [~bibinchundatt] Thanks bibin, But your solution can not meet our requirements completely. The shortcomings are as follows: 1. For some users, we may always want them view our applications. If we do that by setting ApplicationAccessType#VIEW_APP acl rights in containerLaunchContext, we should always set it. This is very boring and redundant. And at other hand, administrator can authorize VIEW_APP permissions to other users. This privilege is independent of the submitter's authority. It can be understood that this is an authorization that is different from an administrator but more than a regular user. 2. It can not authorize users to view a submitted applications. All in all, we should implement a new access type for queue. > Implement a new access type for queue > - > > Key: YARN-6842 > URL: https://issues.apache.org/jira/browse/YARN-6842 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.2 >Reporter: YunFan Zhou >Assignee: YunFan Zhou > Fix For: 2.8.2 > > Attachments: YARN-6842.001.patch, YARN-6842.002.patch, > YARN-6842.003.patch > > > When we want to access applications of a queue, only we can do is become the > administer of the queue at present. > But sometimes we only want authorize someone view applications of a queue > but not modify operation. > In our current mechanism there isn't any way to meet it, so I will implement > a new access type for queue to solve > this problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6768) Improve performance of yarn api record toString and fromString
[ https://issues.apache.org/jira/browse/YARN-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098567#comment-16098567 ] Jason Lowe commented on YARN-6768: -- +1 for the latest patch. The unit test failures are unrelated. Committing this. > Improve performance of yarn api record toString and fromString > -- > > Key: YARN-6768 > URL: https://issues.apache.org/jira/browse/YARN-6768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-6768.1.patch, YARN-6768.2.patch, YARN-6768.3.patch, > YARN-6768.4.patch, YARN-6768.5.patch, YARN-6768.6.patch, YARN-6768.7.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6858) Attribute Manager to store and provide the attributes in RM
[ https://issues.apache.org/jira/browse/YARN-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098541#comment-16098541 ] Arun Suresh commented on YARN-6858: --- Thanks for raising these [~Naganarasimha], I was wondering if, instead of a new component, would it be sufficient to add the attributes in the {{SchdeulerNode}} itself, and have an interface in the {{ClusterNodeTracker}} to query/list nodes via attributes ? > Attribute Manager to store and provide the attributes in RM > --- > > Key: YARN-6858 > URL: https://issues.apache.org/jira/browse/YARN-6858 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, client >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > Similar to CommonNodeLabelsManager we need to have a centralized manager for > Node Attributes too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3409: Attachment: 3409-apiChanges_v2.pdf (4).pdf Attached the document for Proto, CLI & REST documentation > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5548) Use MockRMMemoryStateStore to reduce test failures
[ https://issues.apache.org/jira/browse/YARN-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-5548: --- Attachment: YARN-5548.0016.patch Attaching rebase patch again .. > Use MockRMMemoryStateStore to reduce test failures > -- > > Key: YARN-5548 > URL: https://issues.apache.org/jira/browse/YARN-5548 > Project: Hadoop YARN > Issue Type: Test >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: oct16-easy, test > Attachments: YARN-5548.0001.patch, YARN-5548.0002.patch, > YARN-5548.0003.patch, YARN-5548.0004.patch, YARN-5548.0005.patch, > YARN-5548.0006.patch, YARN-5548.0007.patch, YARN-5548.0008.patch, > YARN-5548.0009.patch, YARN-5548.0010.patch, YARN-5548.0011.patch, > YARN-5548.0012.patch, YARN-5548.0013.patch, YARN-5548.0014.patch, > YARN-5548.0015.patch, YARN-5548.0016.patch > > > https://builds.apache.org/job/PreCommit-YARN-Build/12850/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > Error Message > Stacktrace > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1471885197388 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 0 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1471885197417 > application_state: RMAPP_FINISHED finish_time: 1471885197478> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1656) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6862) Nodemanager resource usage metrics sometimes are negative
[ https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-6862: - Summary: Nodemanager resource usage metrics sometimes are negative (was: There is a bug in computing resource usage in NM.) Target Version/s: 2.8.2 Fix Version/s: (was: 2.8.2) I updated the summary to be something more specific. Also please do not set the Fix version field, as that should only be set once a patch is committed to one or more branches. The Target Version is intended to track the intended version(s) for the fix. > Nodemanager resource usage metrics sometimes are negative > - > > Key: YARN-6862 > URL: https://issues.apache.org/jira/browse/YARN-6862 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.2 >Reporter: YunFan Zhou > > When we collect real-time metrics of resource usage in NM, we found those > values sometimes are invalid. > For example, the following are values when collected at some point: > "milliVcoresUsed":-5808, > "currentPmemUsage":-1, > "currentVmemUsage":-1, > "cpuUsagePercentPerCore":-968.1026 > "cpuUsageTotalCoresPercentage":-24.202564, > "pmemLimit":2147483648, > "vmemLimit":4509715456 > There are many negative values, there may a bug in NM. > We should fix it, because the real-time metrics of NM is pretty important for > us sometimes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6130) [ATSv2 Security] Generate a delegation token for AM when app collector is created and pass it to AM via NM and RM
[ https://issues.apache.org/jira/browse/YARN-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098474#comment-16098474 ] Varun Saxena commented on YARN-6130: Alternatively, we can also compare the token on every allocate response. It would be only 50-60 bytes and would anyways be in a separate RM Allocation thread, no matter which AM we talk about. > [ATSv2 Security] Generate a delegation token for AM when app collector is > created and pass it to AM via NM and RM > - > > Key: YARN-6130 > URL: https://issues.apache.org/jira/browse/YARN-6130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-6130-YARN-5355.01.patch, > YARN-6130-YARN-5355.02.patch, YARN-6130-YARN-5355.03.patch, > YARN-6130-YARN-5355.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6842) Implement a new access type for queue
[ https://issues.apache.org/jira/browse/YARN-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098469#comment-16098469 ] Bibin A Chundatt commented on YARN-6842: [~daemon] During application submission we could set {{ApplicationAccessType#VIEW_APP}} acl rights in containerLaunchContext . Does that solve your usecase.?? > Implement a new access type for queue > - > > Key: YARN-6842 > URL: https://issues.apache.org/jira/browse/YARN-6842 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.2 >Reporter: YunFan Zhou >Assignee: YunFan Zhou > Fix For: 2.8.2 > > Attachments: YARN-6842.001.patch, YARN-6842.002.patch, > YARN-6842.003.patch > > > When we want to access applications of a queue, only we can do is become the > administer of the queue at present. > But sometimes we only want authorize someone view applications of a queue > but not modify operation. > In our current mechanism there isn't any way to meet it, so I will implement > a new access type for queue to solve > this problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6130) [ATSv2 Security] Generate a delegation token for AM when app collector is created and pass it to AM via NM and RM
[ https://issues.apache.org/jira/browse/YARN-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098275#comment-16098275 ] Varun Saxena commented on YARN-6130: Thanks [~jianhe] and [~rohithsharma] for reviews. Sorry could not reply earlier. bq. The AllocateResponse#newInstance method may be not needed. I think if we have the Builder pattern, we don’t need to keep on adding newInstance methods anymore Ok. bq. Even without rmIdentifies, if token is updated with same rm_identifiers then AM has to update it right? Am I missing any particular scenario? I was thinking of caching the rm id and version which came when token was last updated in MapReduce AM. So that we can match against it. This was to get rid of unncessary adding of tokens in UGI if the said token has already been updated. If the token service already exists in tokenMap, which would be true just about everytime, while adding token, in Credentials#addToken, we iterate over all the available tokens. This was a small optimization for that. Assuming AM may not have too many tokens so iterating over the token map may not be that costly though. Thoughts? > [ATSv2 Security] Generate a delegation token for AM when app collector is > created and pass it to AM via NM and RM > - > > Key: YARN-6130 > URL: https://issues.apache.org/jira/browse/YARN-6130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-6130-YARN-5355.01.patch, > YARN-6130-YARN-5355.02.patch, YARN-6130-YARN-5355.03.patch, > YARN-6130-YARN-5355.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6862) There is a bug in computing resource usage in NM.
[ https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098249#comment-16098249 ] Sunil G commented on YARN-6862: --- [~daemon] Thanks for raising the jira. Could you please share some more information regarding cluster and logs if any (NM logs). > There is a bug in computing resource usage in NM. > - > > Key: YARN-6862 > URL: https://issues.apache.org/jira/browse/YARN-6862 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.2 >Reporter: YunFan Zhou > Fix For: 2.8.2 > > > When we collect real-time metrics of resource usage in NM, we found those > values sometimes are invalid. > For example, the following are values when collected at some point: > "milliVcoresUsed":-5808, > "currentPmemUsage":-1, > "currentVmemUsage":-1, > "cpuUsagePercentPerCore":-968.1026 > "cpuUsageTotalCoresPercentage":-24.202564, > "pmemLimit":2147483648, > "vmemLimit":4509715456 > There are many negative values, there may a bug in NM. > We should fix it, because the real-time metrics of NM is pretty important for > us sometimes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6102) RMActiveService context to be updated with new RMContext on failover
[ https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098247#comment-16098247 ] Rohith Sharma K S commented on YARN-6102: - test failures are unrelated to the patch.. there are other open JIRA exist. > RMActiveService context to be updated with new RMContext on failover > > > Key: YARN-6102 > URL: https://issues.apache.org/jira/browse/YARN-6102 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 2.7.3 >Reporter: Ajith S >Assignee: Rohith Sharma K S >Priority: Critical > Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, > YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch, > YARN-6102.06.patch, YARN-6102.07.patch, YARN-6102-branch-2.001.patch, > YARN-6102-branch-2.002.patch > > > {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in > dispatcher thread > java.lang.Exception: No handler for registered for class > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120) > at java.lang.Thread.run(Thread.java:745) > 2017-01-17 16:42:17,914 INFO [AsyncDispatcher ShutDown handler] > event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code} > The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits > abnormally, after some analysis, i was able to reproduce. > Once the nodeHeartBeat is sent to RM, inside > {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}}, > before sending it to dispatcher through > {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} > if RM failover is called, the dispatcher is reset > The new dispatcher is however first started and then the events are > registered at > {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}} > So event order will look like > 1. Send Node heartbeat to {{ResourceTrackerService}} > 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher > call RM failover > 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( > {{resetDispatcher();}} + {{createAndInitActiveServices();}} ) > Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , > the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher > This will cause the above error as at point of time when {{STATUS_UPDATE}} > event is given to dispatcher in {{ResourceTrackerService}} , the new > dispatcher(from the failover) may be started but not yet registered for events > Using same steps(with pausing JVM at debug), i was able to reproduce this in > production cluster also. for {{STATUS_UPDATE}} active service event, when the > service is yet to forward the event to RM dispatcher but a failover is called > and dispatcher reset is between {{resetDispatcher();}} & > {{createAndInitActiveServices();}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098234#comment-16098234 ] Sunil G commented on YARN-3254: --- Thanks [~suma.shivaprasad] Generally patch looks fine. Few minor comments. # I think {{DirectoryCollection#getErroredDirs}} is a public api. Could u please mark as evolving and some more information in api. I think existing java doc is not that great there. However its better we do that. # {{fullLocalDirsList}} could be renamed as {{diskFullLocalDirsList}} or some better name :) similar for logdirs as well. > HealthReport should include disk full information > - > > Key: YARN-3254 > URL: https://issues.apache.org/jira/browse/YARN-3254 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Akira Ajisaka >Assignee: Suma Shivaprasad > Fix For: 3.0.0-beta1 > > Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot > 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch, > YARN-3254-003.patch > > > When a NodeManager's local disk gets almost full, the NodeManager sends a > health report to ResourceManager that "local/log dir is bad" and the message > is displayed on ResourceManager Web UI. It's difficult for users to detect > why the dir is bad. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6788) Improve performance of resource profile branch
[ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098211#comment-16098211 ] Hadoop QA commented on YARN-6788: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} YARN-3926 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 43s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 44s{color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 16s{color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 14s{color} | {color:green} YARN-3926 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 25s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in YARN-3926 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s{color} | {color:green} YARN-3926 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 33s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 0s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 23 new + 126 unchanged - 16 fixed = 149 total (was 142) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 37s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api generated 4 new + 0 unchanged - 1 fixed = 4 total (was 1) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 40s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 59s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 45s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}113m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api | | | Possible null pointer dereference of a on branch that might be infeasible in org.apache.hadoop.yarn.api.records.Resource.equals(Object) Dereferenced at Resource.java:a on branch that might be infeasible in org.apache.hadoop.yarn.api.records.Resource.equals(Object) Dereferenced at Resource.java:[line 358] | | | org.apache.hadoop.yarn.api.records.impl.BaseResource.getResources() may expose internal representation by returning BaseResource.resources At BaseResource.java:by returning BaseResource.resources At BaseResource.java:[line 131] | | | Public static org.apache.hadoop.yarn.util.resource.ResourceUtils.getResourceNamesArray() may
[jira] [Commented] (YARN-6102) RMActiveService context to be updated with new RMContext on failover
[ https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098205#comment-16098205 ] Hadoop QA commented on YARN-6102: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 48s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 137 unchanged - 9 fixed = 137 total (was 146) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 59s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}107m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5e40efe | | JIRA Issue | YARN-6102 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12878594/YARN-6102-branch-2.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname
[jira] [Commented] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail
[ https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098188#comment-16098188 ] Sunil G commented on YARN-5219: --- Thanks [~suma.shivaprasad] I think you meant "set -o pipefail -e". It makes sense to me. Could you please confirm. > When an export var command fails in launch_container.sh, the full container > launch should fail > -- > > Key: YARN-5219 > URL: https://issues.apache.org/jira/browse/YARN-5219 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Hitesh Shah >Assignee: Sunil G > Attachments: YARN-5219.001.patch, YARN-5219.003.patch, > YARN-5219.004.patch, YARN-5219.005.patch, YARN-5219.006.patch, > YARN-5219-branch-2.001.patch > > > Today, a container fails if certain files fail to localize. However, if > certain env vars fail to get setup properly either due to bugs in the yarn > application or misconfiguration, the actual process launch still gets > triggered. This results in either confusing error messages if the process > fails to launch or worse yet the process launches but then starts behaving > wrongly if the env var is used to control some behavioral aspects. > In this scenario, the issue was reproduced by trying to do export > abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6862) There is a bug in computing resource usage in NM.
[ https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YunFan Zhou updated YARN-6862: -- Description: When we collect real-time metrics of resource usage in NM, we found those values sometimes are invalid. For example, the following are values when collected at some point: "milliVcoresUsed":-5808, "currentPmemUsage":-1, "currentVmemUsage":-1, "cpuUsagePercentPerCore":-968.1026 "cpuUsageTotalCoresPercentage":-24.202564, "pmemLimit":2147483648, "vmemLimit":4509715456 There are many negative values, there may a bug in NM. We should fix it, because the real-time metrics of NM is pretty important for us sometimes. > There is a bug in computing resource usage in NM. > - > > Key: YARN-6862 > URL: https://issues.apache.org/jira/browse/YARN-6862 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.2 >Reporter: YunFan Zhou > Fix For: 2.8.2 > > > When we collect real-time metrics of resource usage in NM, we found those > values sometimes are invalid. > For example, the following are values when collected at some point: > "milliVcoresUsed":-5808, > "currentPmemUsage":-1, > "currentVmemUsage":-1, > "cpuUsagePercentPerCore":-968.1026 > "cpuUsageTotalCoresPercentage":-24.202564, > "pmemLimit":2147483648, > "vmemLimit":4509715456 > There are many negative values, there may a bug in NM. > We should fix it, because the real-time metrics of NM is pretty important for > us sometimes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6862) There is a bug in computing resource usage in NM.
YunFan Zhou created YARN-6862: - Summary: There is a bug in computing resource usage in NM. Key: YARN-6862 URL: https://issues.apache.org/jira/browse/YARN-6862 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.2 Reporter: YunFan Zhou Fix For: 2.8.2 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098177#comment-16098177 ] Sunil G commented on YARN-5892: --- Hi [~eepayne] Thank you very much for the effort. Generally patch looks fine for me except below doubt In {{ActiveUsersManager}}, could we avoid *activeUsersChanged* if possible. May be we could keep an active set in ActiveUsersManager itself and we could clear this set when activate/deactivateApplication is invoked. > Support user-specific minimum user limit percentage in Capacity Scheduler > - > > Key: YARN-5892 > URL: https://issues.apache.org/jira/browse/YARN-5892 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0-alpha3 > > Attachments: Active users highlighted.jpg, YARN-5892.001.patch, > YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, > YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, > YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, > YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, > YARN-5892.015.patch, YARN-5892.branch-2.015.patch, > YARN-5892.branch-2.016.patch, YARN-5892.branch-2.8.016.patch, > YARN-5892.branch-2.8.017.patch, YARN-5892.branch-2.8.018.patch > > > Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} > property is per queue. A cluster admin should be able to set the minimum user > limit percent on a per-user basis within the queue. > This functionality is needed so that when intra-queue preemption is enabled > (YARN-4945 / YARN-2113), some users can be deemed as more important than > other users, and resources from VIP users won't be as likely to be preempted. > For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user > {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed > 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like > this: > {code} > > > yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent > 25 > > > > yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent > 75 > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6734) Ensure sub-application user is extracted & sent to timeline service
[ https://issues.apache.org/jira/browse/YARN-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098174#comment-16098174 ] Rohith Sharma K S commented on YARN-6734: - [~varun_saxena] do you have any further comments on the patch? > Ensure sub-application user is extracted & sent to timeline service > --- > > Key: YARN-6734 > URL: https://issues.apache.org/jira/browse/YARN-6734 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Attachments: YARN-6734-YARN-5355.001.patch > > > After a discussion with Tez folks, we have been thinking over introducing a > table to store sub-application information. YARN-6733 > For example, if a Tez session runs for a certain period as User X and runs a > few AMs. These AMs accept DAGs from other users. Tez will execute these dags > with a doAs user. ATSv2 should store this information in a new table perhaps > called as "sub_application" table. > YARN-6733 tracks the code changes needed for table schema creation. > This jira tracks writing to that table, updating the user name fields to > include sub-application user etc. This would mean adding a field to Flow > Context which can store an additional user -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6861) Reader API for sub application entities
Rohith Sharma K S created YARN-6861: --- Summary: Reader API for sub application entities Key: YARN-6861 URL: https://issues.apache.org/jira/browse/YARN-6861 Project: Hadoop YARN Issue Type: Sub-task Components: timelinereader Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S YARN-6733 and YARN-6734 writes data into sub application table. There should be a way to read those entities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved YARN-6860. - Resolution: Duplicate > TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently > --- > > Key: YARN-6860 > URL: https://issues.apache.org/jira/browse/YARN-6860 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > Attachments: YARN-6860.01.patch > > > https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1500886835515 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 1 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 > application_state: RMAPP_FINISHED finish_time: 1500886835559> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098161#comment-16098161 ] Akira Ajisaka commented on YARN-6860: - I looked YARN-5548 and probably it will fix this failure. Closing this as dup. Thanks [~rohithsharma] and [~varun_saxena]. > TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently > --- > > Key: YARN-6860 > URL: https://issues.apache.org/jira/browse/YARN-6860 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > Attachments: YARN-6860.01.patch > > > https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1500886835515 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 1 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 > application_state: RMAPP_FINISHED finish_time: 1500886835559> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-6860: Attachment: YARN-6860.01.patch Attaching a patch to use GenericTestUtils.waitFor. > TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently > --- > > Key: YARN-6860 > URL: https://issues.apache.org/jira/browse/YARN-6860 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > Attachments: YARN-6860.01.patch > > > https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1500886835515 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 1 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 > application_state: RMAPP_FINISHED finish_time: 1500886835559> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5548) Use MockRMMemoryStateStore to reduce test failures
[ https://issues.apache.org/jira/browse/YARN-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098159#comment-16098159 ] Hadoop QA commented on YARN-5548: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-5548 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-5548 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12862014/YARN-5548.0015.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16531/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Use MockRMMemoryStateStore to reduce test failures > -- > > Key: YARN-5548 > URL: https://issues.apache.org/jira/browse/YARN-5548 > Project: Hadoop YARN > Issue Type: Test >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: oct16-easy, test > Attachments: YARN-5548.0001.patch, YARN-5548.0002.patch, > YARN-5548.0003.patch, YARN-5548.0004.patch, YARN-5548.0005.patch, > YARN-5548.0006.patch, YARN-5548.0007.patch, YARN-5548.0008.patch, > YARN-5548.0009.patch, YARN-5548.0010.patch, YARN-5548.0011.patch, > YARN-5548.0012.patch, YARN-5548.0013.patch, YARN-5548.0014.patch, > YARN-5548.0015.patch > > > https://builds.apache.org/job/PreCommit-YARN-Build/12850/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > Error Message > Stacktrace > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1471885197388 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 0 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1471885197417 > application_state: RMAPP_FINISHED finish_time: 1471885197478> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1656) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5548) Use MockRMMemoryStateStore to reduce test failures
[ https://issues.apache.org/jira/browse/YARN-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098147#comment-16098147 ] Varun Saxena commented on YARN-5548: Bibin can you rebase the patch? Sorry for missing this. > Use MockRMMemoryStateStore to reduce test failures > -- > > Key: YARN-5548 > URL: https://issues.apache.org/jira/browse/YARN-5548 > Project: Hadoop YARN > Issue Type: Test >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: oct16-easy, test > Attachments: YARN-5548.0001.patch, YARN-5548.0002.patch, > YARN-5548.0003.patch, YARN-5548.0004.patch, YARN-5548.0005.patch, > YARN-5548.0006.patch, YARN-5548.0007.patch, YARN-5548.0008.patch, > YARN-5548.0009.patch, YARN-5548.0010.patch, YARN-5548.0011.patch, > YARN-5548.0012.patch, YARN-5548.0013.patch, YARN-5548.0014.patch, > YARN-5548.0015.patch > > > https://builds.apache.org/job/PreCommit-YARN-Build/12850/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > Error Message > Stacktrace > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1471885197388 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 0 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1471885197417 > application_state: RMAPP_FINISHED finish_time: 1471885197478> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1656) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098145#comment-16098145 ] Varun Saxena commented on YARN-6860: Sorry I had to get in YARN-5548. Missed it. Will get it in by today. > TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently > --- > > Key: YARN-6860 > URL: https://issues.apache.org/jira/browse/YARN-6860 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > > https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1500886835515 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 1 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 > application_state: RMAPP_FINISHED finish_time: 1500886835559> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098142#comment-16098142 ] Akira Ajisaka commented on YARN-6860: - Okay, I'll check YARN-5548. > TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently > --- > > Key: YARN-6860 > URL: https://issues.apache.org/jira/browse/YARN-6860 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > > https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1500886835515 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 1 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 > application_state: RMAPP_FINISHED finish_time: 1500886835559> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098141#comment-16098141 ] Akira Ajisaka commented on YARN-6860: - The test fails in the following code: {code} // the first app0 get kicked out from both rmContext and state store Assert.assertNull(rm2.getRMContext().getRMApps() .get(app0.getApplicationId())); Assert.assertNull(rmAppState.get(app0.getApplicationId())); {code} RMAppManager removes app0 from rmContext by blocking API, and removes it from state store by non-blocking API (Please see {{RMStateStore#removeApplication}} for the detail). That way the latter assertion may fail. I'm thinking the issue can be fixed by adding wait via {{GenericTestUtils#waitFor}}. I'll attach a patch shortly. > TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently > --- > > Key: YARN-6860 > URL: https://issues.apache.org/jira/browse/YARN-6860 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > > https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1500886835515 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 1 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 > application_state: RMAPP_FINISHED finish_time: 1500886835559> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098140#comment-16098140 ] Rohith Sharma K S commented on YARN-6860: - There is JIRA exist with for this test case failure i.e YARN-5548. > TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently > --- > > Key: YARN-6860 > URL: https://issues.apache.org/jira/browse/YARN-6860 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka > > https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ > {noformat} > java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: > 1500886835515 } application_name: "" queue: "default" priority { priority: 0 > } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 > resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" > keep_containers_across_application_attempts: false > attempt_failures_validity_interval: 0 am_container_resource_request { > priority { priority: 0 } resource_name: "*" capability { memory: 1024 > virtual_cores: 1 } num_containers: 1 relax_locality: true > node_label_expression: "" execution_type_request { execution_type: GUARANTEED > enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 > application_state: RMAPP_FINISHED finish_time: 1500886835559> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6130) [ATSv2 Security] Generate a delegation token for AM when app collector is created and pass it to AM via NM and RM
[ https://issues.apache.org/jira/browse/YARN-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098138#comment-16098138 ] Rohith Sharma K S commented on YARN-6130: - bq. The intention here is to avoid updating the token in AM UGI on every allocate response. We can potentially cache the RMID and version to ensure that the version of token coming from RM is same as the one already updated by AM in its UGI. Thoughts? Even without rmIdentifies, if token is updated with same rm_identifiers then AM has to update it right? Am I missing any particular scenario? [~jianhe] bq. what is the existing AppCollectorData#rmIdentifier used for ? this is used to handle race condition between 2 NM sending collector address to RM. Lets say, because of split brain one NM is out of sync and application is relaunched in different NodeManager. After NM is reconnected from split brain, both the NMs will keep sending collector data to RM and updates wrong collector address in RM which intern AM will update wrong collector address. > [ATSv2 Security] Generate a delegation token for AM when app collector is > created and pass it to AM via NM and RM > - > > Key: YARN-6130 > URL: https://issues.apache.org/jira/browse/YARN-6130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Attachments: YARN-6130-YARN-5355.01.patch, > YARN-6130-YARN-5355.02.patch, YARN-6130-YARN-5355.03.patch, > YARN-6130-YARN-5355.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
Akira Ajisaka created YARN-6860: --- Summary: TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently Key: YARN-6860 URL: https://issues.apache.org/jira/browse/YARN-6860 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Akira Ajisaka Assignee: Akira Ajisaka https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ {noformat} java.lang.AssertionError: expected null, but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotNull(Assert.java:664) at org.junit.Assert.assertNull(Assert.java:646) at org.junit.Assert.assertNull(Assert.java:656) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration
[ https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098133#comment-16098133 ] Sunil G commented on YARN-4161: --- One doubt: I think we are not checking below condition in {{canAllocateMore}}. I might have lost context, please help to correct me if I am wrong. {code} if (assignment.getAssignmentInformation().getNumReservations() == 0) { return true; } {code} > Capacity Scheduler : Assign single or multiple containers per heart beat > driven by configuration > > > Key: YARN-4161 > URL: https://issues.apache.org/jira/browse/YARN-4161 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Labels: oct16-medium > Attachments: YARN-4161.002.patch, YARN-4161.003.patch, > YARN-4161.004.patch, YARN-4161.005.patch, YARN-4161.patch, YARN-4161.patch.1 > > > Capacity Scheduler right now schedules multiple containers per heart beat if > there are more resources available in the node. > This approach works fine however in some cases its not distribute the load > across the cluster hence throughput of the cluster suffers. I am adding > feature to drive that using configuration by that we can control the number > of containers assigned per heart beat. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6240) TestCapacityScheduler.testRefreshQueuesWithQueueDelete fails randomly
[ https://issues.apache.org/jira/browse/YARN-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098119#comment-16098119 ] Rohith Sharma K S commented on YARN-6240: - Recently it failed in branch-2 [build|https://builds.apache.org/job/PreCommit-YARN-Build/16527/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_131.txt] [~Naganarasimha] do you have any updates for fixing this JIRA? > TestCapacityScheduler.testRefreshQueuesWithQueueDelete fails randomly > - > > Key: YARN-6240 > URL: https://issues.apache.org/jira/browse/YARN-6240 > Project: Hadoop YARN > Issue Type: Test > Components: test >Reporter: Sunil G >Assignee: Naganarasimha G R > Attachments: YARN-6240.001.patch > > > *Error Message* > Expected to NOT throw exception when refresh queue tries to delete a queue > WITHOUT running apps > Link > [here|https://builds.apache.org/job/PreCommit-YARN-Build/15092/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacityScheduler/testRefreshQueuesWithQueueDelete/] > *Stacktrace* > {code} > java.lang.AssertionError: Expected to NOT throw exception when refresh queue > tries to delete a queue WITHOUT running apps > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testRefreshQueuesWithQueueDelete(TestCapacityScheduler.java:3875) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098112#comment-16098112 ] Sunil G commented on YARN-6678: --- Thanks [~Tao Yang], I will commit the patch in a day if there are no objections. > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6678.001.patch, YARN-6678.002.patch, > YARN-6678.003.patch, YARN-6678.004.patch, YARN-6678.005.patch > > > Error log: > {noformat} > java.lang.IllegalStateException: Trying to reserve container > container_e10_1495599791406_7129_01_001453 for application > appattempt_1495599791406_7129_01 when currently reserved container > container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 > #containers=40 available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) > {noformat} > Reproduce this problem: > 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1 > 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and > allocated app-1/container-X2 > 3. nm1 reserved app-2/container-Y > 4. proposal-1 was accepted but throw IllegalStateException when applying > Currently the check code for reserve proposal in FiCaSchedulerApp#accept as > follows: > {code} > // Container reserved first time will be NEW, after the container > // accepted & confirmed, it will become RESERVED state > if (schedulerContainer.getRmContainer().getState() > == RMContainerState.RESERVED) { > // Set reReservation == true > reReservation = true; > } else { > // When reserve a resource (state == NEW is for new container, > // state == RUNNING is for increase container). > // Just check if the node is not already reserved by someone > if (schedulerContainer.getSchedulerNode().getReservedContainer() > != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Try to reserve a container, but the node is " > + "already reserved by another container=" > + schedulerContainer.getSchedulerNode() > .getReservedContainer().getContainerId()); > } > return false; > } > } > {code} > The reserved container on the node of reserve proposal will be checked only > for first-reserve container. > We should confirm that reserved container on this node is equal to re-reserve > container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6102) RMActiveService context to be updated with new RMContext on failover
[ https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6102: Attachment: YARN-6102-branch-2.002.patch updated branch-2 patch fixing findbugs. Test failures are unrelated to patch and many open JIRAs are exist. > RMActiveService context to be updated with new RMContext on failover > > > Key: YARN-6102 > URL: https://issues.apache.org/jira/browse/YARN-6102 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 2.7.3 >Reporter: Ajith S >Assignee: Rohith Sharma K S >Priority: Critical > Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, > YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch, > YARN-6102.06.patch, YARN-6102.07.patch, YARN-6102-branch-2.001.patch, > YARN-6102-branch-2.002.patch > > > {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in > dispatcher thread > java.lang.Exception: No handler for registered for class > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120) > at java.lang.Thread.run(Thread.java:745) > 2017-01-17 16:42:17,914 INFO [AsyncDispatcher ShutDown handler] > event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code} > The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits > abnormally, after some analysis, i was able to reproduce. > Once the nodeHeartBeat is sent to RM, inside > {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}}, > before sending it to dispatcher through > {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} > if RM failover is called, the dispatcher is reset > The new dispatcher is however first started and then the events are > registered at > {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}} > So event order will look like > 1. Send Node heartbeat to {{ResourceTrackerService}} > 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher > call RM failover > 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( > {{resetDispatcher();}} + {{createAndInitActiveServices();}} ) > Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , > the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher > This will cause the above error as at point of time when {{STATUS_UPDATE}} > event is given to dispatcher in {{ResourceTrackerService}} , the new > dispatcher(from the failover) may be started but not yet registered for events > Using same steps(with pausing JVM at debug), i was able to reproduce this in > production cluster also. for {{STATUS_UPDATE}} active service event, when the > service is yet to forward the event to RM dispatcher but a failover is called > and dispatcher reset is between {{resetDispatcher();}} & > {{createAndInitActiveServices();}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6788) Improve performance of resource profile branch
[ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-6788: -- Attachment: YARN-6788-YARN-3926.011.patch Thanks [~leftnoteasy]. Uploading a patch addressing latest comments. bq.Additional items (performance related) Yes, I think these figures makes sense. I ran a jmeter analysis test on all apis from Resources class. And there is some performance dip, however there will be some cost associated with extra resource objects. Barring same, I was planning to add these test case in a new patch (with some compile time flag to trigger performance tests). bq.Additional items (non-performance related) Sure. I ll be handling these in another patch. > Improve performance of resource profile branch > -- > > Key: YARN-6788 > URL: https://issues.apache.org/jira/browse/YARN-6788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Sunil G >Assignee: Sunil G >Priority: Blocker > Attachments: YARN-6788-YARN-3926.001.patch, > YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, > YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, > YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, > YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, > YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch > > > Currently we could see a 15% performance delta with this branch. > Few performance improvements to improve the same. > Also this patch will handle > [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418] > from [~leftnoteasy]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6859) Add missing test scope to the zookeeper dependency in hadoop-yarn-server-resourcemanager test-jar
[ https://issues.apache.org/jira/browse/YARN-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098103#comment-16098103 ] Hadoop QA commented on YARN-6859: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 45s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6859 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12878583/add_test_scope.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux fa77db9524f5 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 770cc46 | | Default Java | 1.8.0_131 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16528/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16528/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add missing test scope to the zookeeper dependency in > hadoop-yarn-server-resourcemanager test-jar > - > > Key: YARN-6859 > URL: https://issues.apache.org/jira/browse/YARN-6859 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Minor > Attachments: add_test_scope.patch > > > Reported by Sean Mackrory
[jira] [Commented] (YARN-6102) RMActiveService context to be updated with new RMContext on failover
[ https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098087#comment-16098087 ] Hadoop QA commented on YARN-6102: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 137 unchanged - 9 fixed = 137 total (was 146) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 33s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 18s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Unused field:ResourceManager.java | | JDK v1.8.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5e40efe | | JIRA Issue | YARN-6102 | | JIRA
[jira] [Assigned] (YARN-6859) Add missing test scope to the zookeeper dependency in hadoop-yarn-server-resourcemanager test-jar
[ https://issues.apache.org/jira/browse/YARN-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka reassigned YARN-6859: --- Assignee: Akira Ajisaka > Add missing test scope to the zookeeper dependency in > hadoop-yarn-server-resourcemanager test-jar > - > > Key: YARN-6859 > URL: https://issues.apache.org/jira/browse/YARN-6859 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Minor > Attachments: add_test_scope.patch > > > Reported by Sean Mackrory in [common-dev > ML|http://markmail.org/message/3wvcdwcoyas2255f]. When compiling Apache > Hadoop with {{-Dzookeeper.version=3.5.3-beta}}, the build fails by the > following error: > {noformat} > [WARNING] > Dependency convergence error for org.apache.zookeeper:zookeeper:3.5.3-beta > paths to dependency are: > +-org.apache.hadoop:hadoop-yarn-server-tests:3.0.0-beta1-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.0.0-beta1-SNAPSHOT > +-org.apache.zookeeper:zookeeper:3.5.3-beta > ... > and > +-org.apache.hadoop:hadoop-yarn-server-tests:3.0.0-beta1-SNAPSHOT > +-org.apache.hadoop:hadoop-yarn-server-resourcemanager:3.0.0-beta1-SNAPSHOT > +-org.apache.zookeeper:zookeeper:3.5.3-beta > and > +-org.apache.hadoop:hadoop-yarn-server-tests:3.0.0-beta1-SNAPSHOT > +-org.apache.hadoop:hadoop-yarn-server-resourcemanager:3.0.0-beta1-SNAPSHOT > +-org.apache.zookeeper:zookeeper:3.4.9 > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org