[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716178#comment-14716178 ] Inigo Goiri commented on YARN-1012: --- I think this is very YARN specific. It relies on the ResourceCalculator and so on which come from Common though. Regarding adding network and disk usage, I fully agree. You guys should first extend ResourceUtilization (as done in this patch) to support disk and network and then extend the node resource monitor (YARN-3534) to collect it from the node. > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716162#comment-14716162 ] Hadoop QA commented on YARN-3920: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:red}-1{color} | javac | 3m 37s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752667/YARN-3920.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4cbbfa2 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8924/console | This message was automatically generated. > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3920.004.patch, YARN-3920.004.patch, > YARN-3920.004.patch, yARN-3920.001.patch, yARN-3920.002.patch, > yARN-3920.003.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716153#comment-14716153 ] Jian He commented on YARN-2884: --- Looks good to me overall, I think there are still some problems with the AMRMProxyToken implementation. Basically, long running service may not work with the AMRMProxy. 1) below code in DefaultRequestInterceptor should create and return a new AMRMProxyToken in the final returned allocate response when needed. Otherwise, AM will fail to talk with AMRMTokenProxy after the key is rolled over in the AMRMTokenProxySecretManager. {code} @Override public AllocateResponse allocate(AllocateRequest request) throws YarnException, IOException { if (LOG.isDebugEnabled()) { LOG.debug("Forwarding allocate request to the real YARN RM"); } AllocateResponse allocateResponse = rmClient.allocate(request); if (allocateResponse.getAMRMToken() != null) { updateAMRMToken(allocateResponse.getAMRMToken()); } return allocateResponse; < } {code} Below code in ApplicationMasterService#allocate shows how that is done. {code} if (nextMasterKey != null && nextMasterKey.getMasterKey().getKeyId() != amrmTokenIdentifier .getKeyId()) { RMAppAttemptImpl appAttemptImpl = (RMAppAttemptImpl)appAttempt; Token amrmToken = appAttempt.getAMRMToken(); if (nextMasterKey.getMasterKey().getKeyId() != appAttemptImpl.getAMRMTokenKeyId()) { LOG.info("The AMRMToken has been rolled-over. Send new AMRMToken back" + " to application: " + applicationId); amrmToken = rmContext.getAMRMTokenSecretManager() .createAndGetAMRMToken(appAttemptId); appAttemptImpl.setAMRMToken(amrmToken); } allocateResponse.setAMRMToken(org.apache.hadoop.yarn.api.records.Token .newInstance(amrmToken.getIdentifier(), amrmToken.getKind() .toString(), amrmToken.getPassword(), amrmToken.getService() .toString())); } {code} 2) Some methods inside the AMRMProxyTokenSecretManager are not used at all. we may remove them ? 3) I think we need at least 1 end-to-end test for this. We can use MiniYarnCluster to simulate the whole thing. AM talks with AMRMProxy which talks with RM to register/allocate/finish. In the test, we should also reduce the RM_AMRM_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS so that we can simulate the token renew behavior. I'm ok to have a separate jira to track the end-to-end test, as this is a bit of work. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, > YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, > YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler addresss
[ https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716146#comment-14716146 ] Jian He commented on YARN-4083: --- One other thing to think about is what if NM died, should AM fall back to the RM? Also, in case of RM HA, there will be multiple RM scheduler addresses, simply swapping out a single scheduler address will not work. > Add a discovery mechanism for the scheduler addresss > > > Key: YARN-4083 > URL: https://issues.apache.org/jira/browse/YARN-4083 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > Today many apps like Distributed Shell, REEF, etc rely on the fact that the > HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler > address. This JIRA proposes the addition of an explicit discovery mechanism > for the scheduler address -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3920: Attachment: YARN-3920.004.patch retrigger > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3920.004.patch, YARN-3920.004.patch, > YARN-3920.004.patch, yARN-3920.001.patch, yARN-3920.002.patch, > yARN-3920.003.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716119#comment-14716119 ] Hadoop QA commented on YARN-3920: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 51s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:red}-1{color} | javac | 3m 39s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752657/YARN-3920.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4cbbfa2 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8923/console | This message was automatically generated. > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3920.004.patch, YARN-3920.004.patch, > yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.POC.001.patch Posting a v.1 POC patch. This implements the first query (the flow activity query). I'll follow it up with another one tomorrow that implements the second one too. This is to get the design choices and correctness reviewed first. It does - include the flow activity query as part of getEntities() - create a data container for the flow activity table called FlowActivityEntity It probably needs a fair amount of refactoring to make the reader code more manageable. Also, I need to add unit tests. They will come later. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3920: Attachment: YARN-3920.004.patch Updated configuration in FairSchedulerConfiguration > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3920.004.patch, YARN-3920.004.patch, > yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3920: Attachment: YARN-3920.004.patch Attaching a patch based on the multiple of increment approach > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3920.004.patch, yARN-3920.001.patch, > yARN-3920.002.patch, yARN-3920.003.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] RM should dynamically schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716062#comment-14716062 ] Srikanth Kandula commented on YARN-1011: +1 > [Umbrella] RM should dynamically schedule containers based on utilization of > currently allocated containers > --- > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4088) RM should be able to process heartbeats from NM asynchronously
Srikanth Kandula created YARN-4088: -- Summary: RM should be able to process heartbeats from NM asynchronously Key: YARN-4088 URL: https://issues.apache.org/jira/browse/YARN-4088 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Srikanth Kandula Today, the RM sequentially processes one heartbeat after another. Imagine a 3000 server cluster with each server heart-beating every 3s. This gives the RM 1ms on average to process each NM heartbeat. That is tough. It is true that there are several underlying datastructures that will be touched during heartbeat processing. So, it is non-trivial to parallelize the NM heartbeat. Yet, it is quite doable... Parallelizing the NM heartbeat would substantially improve the scalability of the RM, allowing it to either a) run larger clusters or b) support faster heartbeats or dynamic scaling of heartbeats c) take more asks from each application or c) use cleverer/ more expensive algorithms such as node labels or better packing or ... Indeed the RM's scalability limit has been cited as the motivating reason for a variety of efforts which will become less needed if this can be solved. Ditto for slow heartbeats. See Sparrow and Mercury papers for example. Can we take a shot at this? If not, could we discuss why. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] RM should dynamically schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716034#comment-14716034 ] Srikanth Kandula commented on YARN-1011: This is a great idea. Is there an ETA for this? Could you comment on whether it is being deprioritized for some reason? > [Umbrella] RM should dynamically schedule containers based on utilization of > currently allocated containers > --- > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716032#comment-14716032 ] Srikanth Kandula commented on YARN-3980: +1 this would be very useful to have... Will enable even better packing. > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716031#comment-14716031 ] Srikanth Kandula commented on YARN-3534: [~elgoiri], [~kasha], could you comment on extending this to also take in network and disk usage information? > Collect memory/cpu usage on the node > > > Key: YARN-3534 > URL: https://issues.apache.org/jira/browse/YARN-3534 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.0 >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-3534-1.patch, YARN-3534-10.patch, > YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, > YARN-3534-15.patch, YARN-3534-16.patch, YARN-3534-16.patch, > YARN-3534-17.patch, YARN-3534-17.patch, YARN-3534-18.patch, > YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, > YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, > YARN-3534-9.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > YARN should be aware of the resource utilization of the nodes when scheduling > containers. For this, this task will implement the collection of memory/cpu > usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716029#comment-14716029 ] Srikanth Kandula commented on YARN-1012: [~elgoiri], [~kasha] Could you comment on whether this should go into hadoop common. Also, it may be worthwhile to extend this to also account for network and disk usages of the containers... See Hadoop 12210. > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks
[ https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716023#comment-14716023 ] Srikanth Kandula commented on YARN-2745: [~aw] Done by [~chris.douglas]! > Extend YARN to support multi-resource packing of tasks > -- > > Key: YARN-2745 > URL: https://issues.apache.org/jira/browse/YARN-2745 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, scheduler >Reporter: Robert Grandl >Assignee: Robert Grandl > Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, > tetris_paper.pdf > > > In this umbrella JIRA we propose an extension to existing scheduling > techniques, which accounts for all resources used by a task (CPU, memory, > disk, network) and it is able to achieve three competing objectives: > fairness, improve cluster utilization and reduces average job completion time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks
[ https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716021#comment-14716021 ] Srikanth Kandula commented on YARN-2745: [~vinodkv] Thanks for the related. The efforts are complementary. Indeed, adapting assignment based on the dynamic usage would be a good thing to have. This JIRA is more about packing based on anticipated usages as indicated by the ask. Dynamic packing would be even better. > Extend YARN to support multi-resource packing of tasks > -- > > Key: YARN-2745 > URL: https://issues.apache.org/jira/browse/YARN-2745 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, scheduler >Reporter: Robert Grandl >Assignee: Robert Grandl > Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, > tetris_paper.pdf > > > In this umbrella JIRA we propose an extension to existing scheduling > techniques, which accounts for all resources used by a task (CPU, memory, > disk, network) and it is able to achieve three competing objectives: > fairness, improve cluster utilization and reduces average job completion time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks
[ https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716019#comment-14716019 ] Srikanth Kandula commented on YARN-2745: Just a brief update on this JIRA... 1) [~chris.douglas] pushed through "collection" of network and disk usages to Hadoop common. See Hadoop 12210. 2) [~elgoiri] and [~kasha] in Yarn 3534 and Yarn 3980 collecting cpu and memory info of containers, push that information from the NM to the RM and make it available to the scheduler. 3) Packing requires the scheduler to look past the first "schedulable" task discovered by the capacity scheduler loop. Based on the feedback above, we have decoupled the architectural change needed from the actual packing policy. See Yarn 4056, called bundling. Many different packing policies are allowed in the bundle. 4) These changes are complementary and orthogonal to Yarn-1011. That JIRA recommends, rightly, to adapt RM allocation based on dynamic resource usage of the allocated containers. This JIRA is more about packing containers. It currently does so based on expected resource usages as indicated in the ask. Indeed, packing based on dynamic usage information would be strictly better and is left for future work. > Extend YARN to support multi-resource packing of tasks > -- > > Key: YARN-2745 > URL: https://issues.apache.org/jira/browse/YARN-2745 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, scheduler >Reporter: Robert Grandl >Assignee: Robert Grandl > Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, > tetris_paper.pdf > > > In this umbrella JIRA we propose an extension to existing scheduling > techniques, which accounts for all resources used by a task (CPU, memory, > disk, network) and it is able to achieve three competing objectives: > fairness, improve cluster utilization and reduces average job completion time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715998#comment-14715998 ] Rohith Sharma K S commented on YARN-3250: - Thanks Sunil G for reviewing the patch. The test case failure are unrelated to this patch!! > Support admin cli interface in for Application Priority > --- > > Key: YARN-3250 > URL: https://issues.apache.org/jira/browse/YARN-3250 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, > 0003-YARN-3250.patch > > > Current Application Priority Manager supports only configuration via file. > To support runtime configurations for admin cli and REST, a common management > interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class
[ https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715991#comment-14715991 ] Srikanth Kandula commented on YARN-4081: Extending to multiple resources is great, but why use a Map? Is there a rough idea how many different resources one may want to encode? It seems an overkill to incur so much additional overhead if say all that is needed is a handful of more resources. Ditto for encapsulating strings in URIs and the ResourceInformation wrapper over doubles. It would perhaps have been okay if this datastructure was less often used but if i understand correctly, Resources is created/destroyed at least once per ask/ assignment and often many more times... > Add support for multiple resource types in the Resource class > - > > Key: YARN-4081 > URL: https://issues.apache.org/jira/browse/YARN-4081 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4081-YARN-3926.001.patch > > > For adding support for multiple resource types, we need to add support for > this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715892#comment-14715892 ] Hadoop QA commented on YARN-4087: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 27s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 58s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 10s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | | | 46m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752615/YARN-4087.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f44b599 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8922/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8922/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8922/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8922/console | This message was automatically generated. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715872#comment-14715872 ] Bibin A Chundatt commented on YARN-4087: So by default in yarn-default.xml yarn.resourcemanager.fail-fast=true yarn.fail-fast=false In YarnConfiguration {code} public static boolean shouldRMFailFast(Configuration conf) { return conf.getBoolean(YarnConfiguration.RM_FAIL_FAST, conf.getBoolean(YarnConfiguration.YARN_FAIL_FAST, YarnConfiguration.DEFAULT_YARN_FAIL_FAST)); } {code} some mismatch rt? No plans to change YarnConfiguration.RM_FAIL_FAST. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715829#comment-14715829 ] Karthik Kambatla commented on YARN-4087: +1, if fail-fast hasn't been in any prior release and we are not drastically altering the behavior. In any case, it would be nice to release note this new behavior for 2.8.0. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4087: -- Attachment: YARN-4087.1.patch simple patch which flips the config > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715809#comment-14715809 ] Li Lu commented on YARN-3816: - Hi [~djp], I briefly looked at the patch, and have one quick question: In application table, we no longer store the type of the incoming entities, IIUC. All entity types from the application table will be added in HBaseReader, as in: {code} String entityType = isApplication ? TimelineEntityType.YARN_APPLICATION.toString() : EntityColumn.TYPE.readResult(result).toString(); {code} In this case, maybe we're missing YARN_APPLICATION_AGGREGATION types and we can no longer differentiating them? Or, any other ways we can recognize if an entity comes from application itself, or from aggregation? (Am I missing anything? ) > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4087: -- Summary: Set YARN_FAIL_FAST to be false by default (was: Set RM_FAIL_FAST to be false by default) > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4087) Set RM_FAIL_FAST to be false by default
Jian He created YARN-4087: - Summary: Set RM_FAIL_FAST to be false by default Key: YARN-4087 URL: https://issues.apache.org/jira/browse/YARN-4087 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Increasingly, I feel setting this property to be false makes more sense especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715765#comment-14715765 ] Sangjin Lee commented on YARN-4074: --- I am about 90% done with the POC patch for this. I'm shooting for some time tomorrow to be able to post the patch. In the meantime, in order to enable [~varun_saxena] and others to make progress, the following is the proposal that I'm implementing. Please *do* let me know if you have any questions or issues with the proposal so we can adjust accordingly. (REST API) In order to support the POC UI, we will implement 2 new queries: # given the cluster, return the N most recent flows from the flow activity table # given the cluster, user, flow id, and flow run id, return the flow run (with metrics) from the flow run table At the REST level, they can be represented as follows for example: # /listFlows/clusterId?limit=100 # /flow/clusterId/userId/flowName/flowRun (UI) With these URLs, the UI can invoke the first URL to render the landing page with the table. The REST output contains the flow activity records along with all the flow runs that were active during the day. If the user drills down on a single flow, then the client side can generate the second queries against all the flow runs for that flow to fetch the metrics at the flow run level. If the user further drills down into a single flow run, then it can do a (existing) query to retrieve all applications for a given flow run to get the application entities. (reader interface) Currently I am *not* planning to add new flow-specific methods to the {{TimelineReader}} interface. Instead, you can use the existing {{getEntities()}} and {{getEntity()}} methods to perform the above new queries: # {{getEntities()}} with cluster specified and entity type = YARN_FLOW_ACTIVITY (a new timeline entity type) # {{getEntity()}} with cluster, user, flow id, flow run id specified and entity type = YARN_FLOW > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715754#comment-14715754 ] Varun Saxena commented on YARN-3528: # In TestNodeStatusUpdater#createNMConfig, change has been missed. Still see hardcoded port. {code} conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, localhostAddress + ":12346"); {code} # In TestContainer, port is just used for creating container token. Dont need to call ServerSocketUtil#getPort. # Nit : TestNodeManagerShutdown#startContainer, below commented line can be removed. {code} //final int port = ServerSocketUtil.getPort(49156, 10); {code} # As you will be changing other things, maybe can change below as well. In TestNodeManagerShutdown I dont see any need to add a try-catch block here. We have just replaced 12345 with a passed port. {code} -InetSocketAddress containerManagerBindAddress = -NetUtils.createSocketAddrForHost("127.0.0.1", 12345); +InetSocketAddress containerManagerBindAddress = null; +try { + containerManagerBindAddress = NetUtils.createSocketAddrForHost("127.0.0.1", port); +} catch (Exception e) { + throw new RuntimeException("Fail To Get the Port"); +} {code} Other things look fine. > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715681#comment-14715681 ] Varun Saxena commented on YARN-3528: Will have a look. > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715641#comment-14715641 ] Robert Kanter commented on YARN-3528: - +1 LGTM. Any other comments [~varun_saxena]? > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715591#comment-14715591 ] Ben Podgursky commented on YARN-2962: - Got it. Thanks for the details. It sounds like we'll have some workarounds available if we do run into trouble, which is hopefully good enough for now. > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4048) Linux kernel panic under strict CPU limits
[ https://issues.apache.org/jira/browse/YARN-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715561#comment-14715561 ] Craig Condit commented on YARN-4048: Just my two cents: Using cgroups on CentOS/RHEL 6.x is asking for it... We've experienced similar crashes using anything that utilizes cgroups, not just YARN (for example -- docker). Cgroups is widely regarded as unstable in Linux kernel versions < 3.10 or so. > Linux kernel panic under strict CPU limits > -- > > Key: YARN-4048 > URL: https://issues.apache.org/jira/browse/YARN-4048 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chengbing Liu >Priority: Critical > Attachments: panic.png > > > With YARN-2440 and YARN-2531, we have seen some kernel panics happening under > heavy pressure. Even with YARN-2809, it still panics. > We are using CentOS 6.5, hadoop 2.5.0-cdh5.2.0 with the above patches. I > guess the latest version also has the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4086) Allow Aggregated Log readers to handle HAR files
[ https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715545#comment-14715545 ] Hadoop QA commented on YARN-4086: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 27s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 4 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 25s | The applied patch generated 3 new checkstyle issues (total was 23, now 26). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 6m 57s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | | | 53m 23s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.cli.TestLogsCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752560/YARN-4086.001.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / a4d9acc | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8921/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8921/console | This message was automatically generated. > Allow Aggregated Log readers to handle HAR files > > > Key: YARN-4086 > URL: https://issues.apache.org/jira/browse/YARN-4086 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4086.001.patch > > > This is for the YARN changes for MAPREDUCE-6415. It allows the yarn CLI and > web UIs to read aggregated logs from HAR files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715519#comment-14715519 ] Varun Saxena commented on YARN-2962: bq. how many applications did you have in the RM store before this became a problem Will have to check. I think it was more than 1 apps in our case. Will let you know. bq. switching the zk max messages size via -Djute.maxbuffer= a viable workaround? Yes, that works. Also we can set a lower config value for number of completed apps to be stored in state store. Even 0 can be set. bq. Also, is there a sense of how close this ticket is to being merged? The patches currently here have to be rebased because of recent changes. Had put this on the back burner as this will go in trunk and not in branch-2. If its required to be handled earlier, I will focus on it. Plan to take this up in the coming month anyways. > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4077) FairScheduler Reservation should wait for most relaxed scheduling delay permitted before issuing reservation
[ https://issues.apache.org/jira/browse/YARN-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4077: --- Component/s: fairscheduler > FairScheduler Reservation should wait for most relaxed scheduling delay > permitted before issuing reservation > > > Key: YARN-4077 > URL: https://issues.apache.org/jira/browse/YARN-4077 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot > > Today if an allocation has a node local request that allows for relaxation, > we do not wait for the relaxation delay before issuing the reservation. This > can be too aggressive. Instead we should allow the scheduling delays of > relaxation to expire before we choose to allow reserving a node for the > container. This allows for the request to be satisfied on a different node > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4076) FairScheduler does not allow AM to choose which containers to preempt
[ https://issues.apache.org/jira/browse/YARN-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4076: --- Component/s: fairscheduler > FairScheduler does not allow AM to choose which containers to preempt > - > > Key: YARN-4076 > URL: https://issues.apache.org/jira/browse/YARN-4076 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > > Capacity scheduler allows for AM to choose which containers will be > preempted. See comment about corresponding work pending for FairScheduler > https://issues.apache.org/jira/browse/YARN-568?focusedCommentId=13649126&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13649126 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4086) Allow Aggregated Log readers to handle HAR files
[ https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-4086: Attachment: YARN-4086.001.patch The YARN-4086.001.patch allows the yarn CLI and web UIs to read aggregated logs from HAR files. It's mostly the same as the prelim patch in MAPREDUCE-6415, with some minor changes and unit tests. The patches for this and MAPREDUCE-6415 can be applied independently. *Important:* For the unit tests, I had to include some HAR files, which are basically folders with a few files in them. One of the files is a binary file, which makes generating and applying the patch tricky. I got it to work by generating it with {{git diff --binary > FILE}} and to apply with {{git apply }}. The regular {{patch}} command won't work and it has to be {{-p1}} and not {{-p0}}. I'm not sure if Jenkins will be able to handle this. > Allow Aggregated Log readers to handle HAR files > > > Key: YARN-4086 > URL: https://issues.apache.org/jira/browse/YARN-4086 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-4086.001.patch > > > This is for the YARN changes for MAPREDUCE-6415. It allows the yarn CLI and > web UIs to read aggregated logs from HAR files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4086) Allow Aggregated Log readers to handle HAR files
Robert Kanter created YARN-4086: --- Summary: Allow Aggregated Log readers to handle HAR files Key: YARN-4086 URL: https://issues.apache.org/jira/browse/YARN-4086 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter This is for the YARN changes for MAPREDUCE-6415. It allows the yarn CLI and web UIs to read aggregated logs from HAR files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715429#comment-14715429 ] Allen Wittenauer edited comment on YARN-4084 at 8/26/15 8:10 PM: - So then "mvn compile" is actually what you want (I think) :) was (Author: aw): So then "mvn compile" is actually what you want > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > Attachments: YARN-4084.patch > > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715429#comment-14715429 ] Allen Wittenauer commented on YARN-4084: So then "mvn compile" is actually what you want > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > Attachments: YARN-4084.patch > > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4082) Container shouldn't be killed when node's label updated.
[ https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715393#comment-14715393 ] Hadoop QA commented on YARN-4082: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 1s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 12s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 33s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 32s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 6s | The patch has 23 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 51s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 31s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 95m 1s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752539/YARN-4082.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8920/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8920/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8920/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8920/console | This message was automatically generated. > Container shouldn't be killed when node's label updated. > > > Key: YARN-4082 > URL: https://issues.apache.org/jira/browse/YARN-4082 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4082.1.patch, YARN-4082.2.patch > > > From YARN-2920, containers will be killed if partition of a node changed. > Instead of killing containers, we should update resource-usage-by-partition > properly when node's partition updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715342#comment-14715342 ] Ben Podgursky commented on YARN-2962: - Hi, We're looking at switching to a HA RM and I'm a bit concerned about this ticket, since we have a very active RM. Couple questions for those who encountered the bug: - how many applications did you have in the RM store before this became a problem? - was switching the zk max messages size via -Djute.maxbuffer= a viable workaround? Also, is there a sense of how close this ticket is to being merged? Thanks, Ben > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715341#comment-14715341 ] Hadoop QA commented on YARN-3717: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 25m 15s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 8m 3s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 6s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 3m 1s | Site still builds. | | {color:red}-1{color} | checkstyle | 2m 43s | The applied patch generated 3 new checkstyle issues (total was 16, now 18). | | {color:green}+1{color} | whitespace | 0m 12s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 21s | Post-patch findbugs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager compilation is broken. | | {color:green}+1{color} | findbugs | 6m 21s | The patch does not introduce any new Findbugs (version ) warnings. | | {color:red}-1{color} | yarn tests | 0m 17s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 0m 12s | Tests failed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 0m 19s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 0m 13s | Tests failed in hadoop-yarn-server-applicationhistoryservice. | | {color:red}-1{color} | yarn tests | 0m 13s | Tests failed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 0m 18s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 60m 41s | | \\ \\ || Reason || Tests || | Failed build | hadoop-yarn-api | | | hadoop-yarn-client | | | hadoop-yarn-common | | | hadoop-yarn-server-applicationhistoryservice | | | hadoop-yarn-server-common | | | hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752534/YARN-3717.20150826-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / a4d9acc | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8919/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8919/console | This message was automatically generated. > Improve RM node labels web UI > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, > YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch >
[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715338#comment-14715338 ] Ved Prakash Pandey commented on YARN-4084: -- I realize that my patch enforces to use the {{-Penable-yarn-server-test-module}} option to have normal builds. This is my bad. Rather, I will provide a patch tomorrow which have the switch like {{-Pdisable-yarn-server-test-module}} using which hadoop-yarn-server-test project can be skipped from the build. Please let me know if it sounds ok !!! > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > Attachments: YARN-4084.patch > > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715306#comment-14715306 ] Ved Prakash Pandey commented on YARN-4084: -- Thanks for reply Allen. Actually I am using both -DskipTests in addition to -Dmaven.test.skip=true. The problem comes when I use -Dmaven.test.skip=true which restricts the test code compilation. For this comment sake let me allow to tell that maven.test.skip option skip both the test code compilation and test case execution whereas skipTests skips only execution (not compilation). This may sound unethical to compile source code without compiling test source code and in-fact for open source community this may never be a scenario. But I ran into one requirement wherein my Continuous Integration environment, I have to make a complete build as fast as possible where every minute counts. In such a case disabling the test code compilation saves close to 3 to 4 minutes. > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > Attachments: YARN-4084.patch > > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.
[ https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4082: - Attachment: YARN-4082.2.patch Attached .2 patch, fixed findbugs warnings. > Container shouldn't be killed when node's label updated. > > > Key: YARN-4082 > URL: https://issues.apache.org/jira/browse/YARN-4082 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4082.1.patch, YARN-4082.2.patch > > > From YARN-2920, containers will be killed if partition of a node changed. > Instead of killing containers, we should update resource-usage-by-partition > properly when node's partition updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4085) Generate file with container resource limits in the container work dir
Varun Vasudev created YARN-4085: --- Summary: Generate file with container resource limits in the container work dir Key: YARN-4085 URL: https://issues.apache.org/jira/browse/YARN-4085 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Minor Currently, a container doesn't know what resource limits are being imposed on it. It would be helpful if the NM generated a simple file in the container work dir with the resource limits specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3717: Attachment: YARN-3717.20150826-1.patch Fixing reported testcase failure and locally ran findbugs and dint find any issues induced by the code > Improve RM node labels web UI > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, > YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch > > > 1> Add the default-node-Label expression for each queue in scheduler page. > 2> In Application/Appattempt page show the app configured node label > expression for AM and Job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715222#comment-14715222 ] Hadoop QA commented on YARN-3635: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745158/YARN-3635.6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8918/console | This message was automatically generated. > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Tan, Wangda > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715208#comment-14715208 ] Vrushali C commented on YARN-4074: -- My take is that we can make things as generic as possible but we should have separate apis for flows and flow runs. I had put up an initial proposal for flow based queries in ATS when we started off on this at https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx I believe for the two queries you have listed above [~sjlee0], there would be two rest apis as: 1) Get All Flows Path: /listFlows// Returns: paginated list of apps with aggregated stats (to populate the flows list tab on the UI) Sample URL: http://timelineservice.example.com/ws/v2/listFlows/clusterid?limit=2&&startTime=20140510&endTime=20140601 This would be an UI related aggregation query 2) Get specific Flow's runs Path: /flow[version] Returns: list of flows Sample URL: http://timelineservice.example.com/ws/v2/flow/clusterid/userName/someFlowName_idenitying_a_flow?limit=2&&startTime=1390939248000&endTime=139361764800 > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715192#comment-14715192 ] Vrushali C commented on YARN-4053: -- The way I see this, it comes down to a basic question of whether we really *need* floating point precision in metric values. For instance, cost is a metric which could have a decimal value upon calculation. But, in my opinion say a cost of 5 dollars versus 5.347891 dollars versus a cost of 5.78913 are not that different. A cost of 6.x dollars is different from 5.x. I believe that it does not matter THAT much that cost is 5.347891 or 5.79813. These are hadoop applications, the time duration is rarely going to be exactly consistent for the exactly same code. So metrics will usually have a slight fluctuation between different runs of the exact same job. Storage and querying of Longs is straightforward and clean. No ambiguity in serialization. Contrasting that with storage of various numerical data types in metrics: - all the complexity of storing of column prefixes that can tell us which type is stored so that serialization to/from hbase can be done correctly. - the filtering in hbase becomes so much more complicated with all these different datatypes. > Change the way metric values are stored in HBase Storage > > > Key: YARN-4053 > URL: https://issues.apache.org/jira/browse/YARN-4053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4053-YARN-2928.01.patch > > > Currently HBase implementation uses GenericObjectMapper to convert and store > values in backend HBase storage. This converts everything into a string > representation(ASCII/UTF-8 encoded byte array). > While this is fine in most cases, it does not quite serve our use case for > metrics. > So we need to decide how are we going to encode and decode metric values and > store them in HBase. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14714416#comment-14714416 ] Junping Du commented on YARN-3816: -- Thanks [~varun_saxena] for review and comments! bq. If we use same scheme for long or double, we may end up with 4 ORs' for a single metric. Maybe we can use cell tags for aggregation. That's good point! When I was doing poc patch a few weeks ago, YARN-4053 haven't been bring out to discussion so I thought it was a little overkill to use cell tag for specifying the only boolean value. Now it seems to be a good way, but I would prefer to defer this decision to YARN-4053 to address while there are other priority comments to address here so we can move faster. What do you think? bq. Maybe in TimelineCollector#aggregateMetrics, we should do aggregation only if the flag is enabled. That's true. That's part of reason why aggregation flag is added to metric. Will add check in next patch. bq. In TimelineCollector#appendAggregatedMetricsToEntities any reason we are creating separate TimelineEntity objects for each metric ? Maybe create a single entity containing a set of metrics. Nice catch. bq. 3 new maps have been introduced in TimelineCollector and these are used as base to calculate aggregated value. What if the daemon crashes? For RM, it could persistent maps to RMStateStore. For NM, it may not be enough as NM could be lost also. We need a mechanism that if TimelineCollector is relaunched somewhere else, it will read raw metrics and recover the maps before start to working. This will be part of failed over JIRAs like: YARN-3115, YARN-3359, etc. bq. In TimelineMetricCalculator some functions have duplicate if conditions for long. Fixed. bq. In TimelineMetricCalculator#sum, to avoid negative values due to overflow, we can change conditions like below... Like above comments, the overflow case will be handled in next patch. bq. In TimelineMetric#aggregateTo, maybe use getValues instead of getValuesJAXB? I would prefer to use TreeMap because it sort key (timestamp) when accessing it. aggregateTo() algorithm assume metrics are sorted by timestamp. bq. Also I was wondering if TimelineMetric#aggregateTo should be moved to some util class. TimelineMetric is part of object model and exposed to client. And IIUC aggregateTo wont be called by client. As Li's mentioning below, it is a bit tricky to have utility class for any classes in API, because it would mislead user to use it which is not our intension, at least for now. aggregateTo is not straighfoward and generic useful like methods in TimelineMetricCalculator, so let's hold on to expose it as utility class for now. Make it static sounds good though. bq. What is EntityColumnPrefix#AGGREGATED_METRICS meant for? It is something developed at poc stage a few weeks ago, and it should be removed after we moving to ApplicationTable. > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713541#comment-14713541 ] Allen Wittenauer edited comment on YARN-4084 at 8/26/15 2:36 PM: - Use -DskipTests in addition to -Dmaven.test.skip=true was (Author: aw): Use -PskipTests in addition to -Dmaven.test.skip=true > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > Attachments: YARN-4084.patch > > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713541#comment-14713541 ] Allen Wittenauer commented on YARN-4084: Use -PskipTests in addition to -Dmaven.test.skip=true > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > Attachments: YARN-4084.patch > > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4029) Update LogAggregationStatus to store on finish
[ https://issues.apache.org/jira/browse/YARN-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713076#comment-14713076 ] Bibin A Chundatt commented on YARN-4029: Hi [~xgong] Could you please review patch attached . Also can we add this jira as subtask of YARN-431? > Update LogAggregationStatus to store on finish > -- > > Key: YARN-4029 > URL: https://issues.apache.org/jira/browse/YARN-4029 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4029.patch, Image.jpg > > > Currently the log aggregation status is not getting updated to Store. When RM > is restarted will show NOT_START. > Steps to reproduce > > 1.Submit mapreduce application > 2.Wait for completion > 3.Once application is completed switch RM > *Log Aggregation Status* are changing > *Log Aggregation Status* from SUCCESS to NOT_START -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713002#comment-14713002 ] Bibin A Chundatt commented on YARN-3893: Above comments are for https://builds.apache.org/job/PreCommit-YARN-Build/8915/testReport/ > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713001#comment-14713001 ] Bibin A Chundatt commented on YARN-3893: Test failures are not related to this patch. Have looked into the failed testcases {{hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens}} - Due Bind exception {{hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService}} - Locally verified its working fine and success {{hadoop.yarn.server.resourcemanager.TestClientRMService}} -Ran locally in eclipse its working fine > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712999#comment-14712999 ] Hadoop QA commented on YARN-3893: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 51s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 53m 39s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 93m 18s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752437/0008-YARN-3893.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8916/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8916/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8916/console | This message was automatically generated. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712984#comment-14712984 ] Brahma Reddy Battula commented on YARN-3528: Testcase failures are unrelated..{{TestResourceLocalizationService}} is failing while cleanup dir's.. {noformat} Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.205 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService testPublicResourceInitializesLocalDir(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) Time elapsed: 0.275 sec <<< ERROR! java.lang.IllegalArgumentException: target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/3/filecache/10 does not exist at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) {noformat} > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712975#comment-14712975 ] Hadoop QA commented on YARN-3528: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 8m 27s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 48s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 3s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 48s | Tests passed in hadoop-common. | | {color:red}-1{color} | yarn tests | 7m 27s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 53m 46s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752441/YARN-3528-006.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8917/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8917/console | This message was automatically generated. > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712974#comment-14712974 ] Hadoop QA commented on YARN-3893: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 32s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 51s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 19s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 14s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | | | hadoop.yarn.server.resourcemanager.TestClientRMService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752434/0007-YARN-3893.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8915/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8915/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8915/console | This message was automatically generated. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712942#comment-14712942 ] Hadoop QA commented on YARN-3893: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 15s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 59s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 52m 9s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 90m 50s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752428/0006-YARN-3893.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8914/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8914/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8914/console | This message was automatically generated. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712931#comment-14712931 ] Brahma Reddy Battula commented on YARN-3528: [~rkanter] Sorry for delay and thanks for pinging.. Attached the patch kindly review... > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3528: --- Attachment: YARN-3528-006.patch > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712896#comment-14712896 ] Varun Saxena commented on YARN-3893: The latest patch, 0008-YARN-3893.patch LGTM. +1 pending Jenkins. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3893: --- Attachment: 0008-YARN-3893.patch > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3893: --- Attachment: 0007-YARN-3893.patch Missed one comment {{isRMActive}} check is not required.Attaching patch again > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, 0007-YARN-3893.patch, yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3893: --- Attachment: 0006-YARN-3893.patch So JVM exit is the conclusion after discussion. Attaching patch based on the same > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > 0006-YARN-3893.patch, yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712767#comment-14712767 ] Varun Saxena commented on YARN-3893: Yes I agree. We can exit JVM directly. No need of using fail fast. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712745#comment-14712745 ] Sunil G commented on YARN-3893: --- As I see this, JVM exit is reasonable as proposed by Rohith earlier. Because scheduler configurations are wrong mostly, and its not required to switch to standby or fail-fast etc. Directly if we can exit JVM, it will be clean and there will be enough information available in logs to analyze for config fail reasons. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712700#comment-14712700 ] Tsuyoshi Ozawa commented on YARN-3798: -- [~vinodkv] [~zxu] could you check the latest patches? > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Labels: 2.6.1-candidate > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, > YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, > YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712672#comment-14712672 ] Hadoop QA commented on YARN-2884: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 18s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 31s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 52s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 7m 44s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 53m 29s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 116m 18s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752399/YARN-2884-V11.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8913/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8913/console | This message was automatically generated. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, > YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, > YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712637#comment-14712637 ] Varun Saxena commented on YARN-3893: Infact according to me, we can crash RM on all times if config is wrong. Because till config is corrected, the RM where config is wrong cannot become active(and hence will be unusable). In that case, fail fast config wont even be required. So should we change the behavior to keep RM in standby(but up) if fail fast is set to false ? Anyways can discuss more in detail face to face. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712619#comment-14712619 ] Varun Saxena commented on YARN-3893: I do not have any concern for exiting JVM. If fail fast is true(default behavior), JVM will exit anyways. I was wondering if it would be semantically appropriate to make JVM exit in some cases if somebody has explicitly changed the fail fast config to false. Logs can fill up if yarn-site.xml is wrong on both RMs' too. I am not sure about the webapp part though. Does it require client rm service to be initialized ? AFAIK, if RM is standby it will hit the webapp filter and redirect to other RM(which may be active). Haven't tested UI after applying previous patches, so maybe Bibin can tell. If there are some issues with webapp, we will have to exit the JVM if transition to standby fails. Because there may be no other way out then. I will discuss further on this with you offline. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)