[jira] [Updated] (YARN-9547) ContainerStatusPBImpl default execution type is not returned
[ https://issues.apache.org/jira/browse/YARN-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-9547: Attachment: YARN-9547-001.patch > ContainerStatusPBImpl default execution type is not returned > > > Key: YARN-9547 > URL: https://issues.apache.org/jira/browse/YARN-9547 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-9547-001.patch > > > {code} > @Override > public synchronized ExecutionType getExecutionType() { > ContainerStatusProtoOrBuilder p = viaProto ? proto : builder; > if (!p.hasExecutionType()) { > return null; > } > return convertFromProtoFormat(p.getExecutionType()); > } > {code} > ContainerStatusPBImpl executionType should return default as > ExecutionType.GUARANTEED. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow
[ https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839135#comment-16839135 ] Bilwa S T commented on YARN-9508: - {quote}Hi [~bibinchundatt] We cannot skip queueName because we need it when queueInfo is null. I have removed parameter scheduler. {quote} > YarnConfiguration areNodeLabel enabled is costly in allocation flow > --- > > Key: YARN-9508 > URL: https://issues.apache.org/jira/browse/YARN-9508 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Critical > Attachments: YARN-9508-001.patch, YARN-9508-002.patch, > YARN-9508-003.patch > > > For every allocate request locking can be avoided. Improving performance > {noformat} > "pool-6-thread-300" #624 prio=5 os_prio=0 tid=0x7f2f91152800 nid=0x8ec5 > waiting for monitor entry [0x7f1ec6a8d000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1674) > at > org.apache.hadoop.yarn.conf.YarnConfiguration.areNodeLabelsEnabled(YarnConfiguration.java:3646) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:274) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:261) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:242) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:427) > - locked <0x7f24dd3f9e40> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212) > at > org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow
[ https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-9508: Attachment: YARN-9508-003.patch > YarnConfiguration areNodeLabel enabled is costly in allocation flow > --- > > Key: YARN-9508 > URL: https://issues.apache.org/jira/browse/YARN-9508 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Critical > Attachments: YARN-9508-001.patch, YARN-9508-002.patch, > YARN-9508-003.patch > > > For every allocate request locking can be avoided. Improving performance > {noformat} > "pool-6-thread-300" #624 prio=5 os_prio=0 tid=0x7f2f91152800 nid=0x8ec5 > waiting for monitor entry [0x7f1ec6a8d000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1674) > at > org.apache.hadoop.yarn.conf.YarnConfiguration.areNodeLabelsEnabled(YarnConfiguration.java:3646) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:274) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:261) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:242) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:427) > - locked <0x7f24dd3f9e40> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212) > at > org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9301) Too many InvalidStateTransitionException with SLS
[ https://issues.apache.org/jira/browse/YARN-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T reassigned YARN-9301: --- Assignee: Bilwa S T > Too many InvalidStateTransitionException with SLS > - > > Key: YARN-9301 > URL: https://issues.apache.org/jira/browse/YARN-9301 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Labels: simulator > > Too many InvalidStateTransistionExcetion > {noformat} > 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Can't handle this event > at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > LAUNCHED at RUNNING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:483) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.containerLaunchedOnNode(SchedulerApplicationAttempt.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.containerLaunchedOnNode(AbstractYarnScheduler.java:359) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNewContainerInfo(AbstractYarnScheduler.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1295) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1752) > at > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205) > at > org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Invalid event LAUNCHED > on container container_1550059705491_0067_01_01 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.
[ https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839076#comment-16839076 ] K G Bakthavachalam commented on YARN-9435: -- [~abmodi] destroy is not handled in RM side ,so when we make RM active from standby manually metrics will never get registered because instance is always not null so the check becomes false always in the null check. public static OpportunisticSchedulerMetrics getMetrics() { if(!isInitialized.get()){ synchronized (OpportunisticSchedulerMetrics.class) { if(INSTANCE == null){ INSTANCE = new OpportunisticSchedulerMetrics(); registerMetrics(); isInitialized.set(true); } } } return INSTANCE; } > Add Opportunistic Scheduler metrics in ResourceManager. > --- > > Key: YARN-9435 > URL: https://issues.apache.org/jira/browse/YARN-9435 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9435.001.patch, YARN-9435.002.patch, > YARN-9435.003.patch, YARN-9435.004.patch > > > # Right now there are no metrics available for Opportunistic Scheduler at > ResourceManager. As part of this jira, we will add metrics like number of > allocated opportunistic containers, released opportunistic containers, node > level allocations, rack level allocations etc. for Opportunistic Scheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9549) Not able to run pyspark in docker driver container on Yarn3
[ https://issues.apache.org/jira/browse/YARN-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839068#comment-16839068 ] Jack Zhu commented on YARN-9549: Thanks for you replay, I have attached my yarn-site.xml > Not able to run pyspark in docker driver container on Yarn3 > --- > > Key: YARN-9549 > URL: https://issues.apache.org/jira/browse/YARN-9549 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.2 > Environment: Hadoop 3.1.1.3.1.0.0-78 > spark version 2.3.2.3.1.0.0-78 > Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211 > Server: Docker Engine - Community Version: 18.09.6 >Reporter: Jack Zhu >Priority: Critical > Attachments: Dockerfile, test.py, yarn-site.xml > > > I follow > [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html] > to build up a spark docker image to run pyspark, there isn't a good document > describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use > below command to launch my simple python job: > PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn > --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf > spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf > spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 > --conf > spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 > --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py > > in the test.py, it only simply collect the hostname from the executor, and > check whether the python job run in a container or not. > I found that the driver always run direct on the host, not run in the > container, as a result we need to keep python version in docker image > consistent with the nodemanager, this is meanless to use docker to package > all the dependencies. > > The spark job can be run successfully, below is the std output: > Log Type: stdout > Log Upload Time: Tue May 14 02:07:06 + 2019 > Log Length: 141 > host.test.com > False >going to print all the container names. [True, True, True, > True, True, True, True, True, True] > please see attached Dockfile and test.py > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9549) Not able to run pyspark in docker driver container on Yarn3
[ https://issues.apache.org/jira/browse/YARN-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Zhu updated YARN-9549: --- Attachment: yarn-site.xml > Not able to run pyspark in docker driver container on Yarn3 > --- > > Key: YARN-9549 > URL: https://issues.apache.org/jira/browse/YARN-9549 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.2 > Environment: Hadoop 3.1.1.3.1.0.0-78 > spark version 2.3.2.3.1.0.0-78 > Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211 > Server: Docker Engine - Community Version: 18.09.6 >Reporter: Jack Zhu >Priority: Critical > Attachments: Dockerfile, test.py, yarn-site.xml > > > I follow > [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html] > to build up a spark docker image to run pyspark, there isn't a good document > describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use > below command to launch my simple python job: > PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn > --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf > spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf > spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 > --conf > spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 > --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py > > in the test.py, it only simply collect the hostname from the executor, and > check whether the python job run in a container or not. > I found that the driver always run direct on the host, not run in the > container, as a result we need to keep python version in docker image > consistent with the nodemanager, this is meanless to use docker to package > all the dependencies. > > The spark job can be run successfully, below is the std output: > Log Type: stdout > Log Upload Time: Tue May 14 02:07:06 + 2019 > Log Length: 141 > host.test.com > False >going to print all the container names. [True, True, True, > True, True, True, True, True, True] > please see attached Dockfile and test.py > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9550) Suspect wrong way to calculater container utilized vcore.
Sihai Ke created YARN-9550: -- Summary: Suspect wrong way to calculater container utilized vcore. Key: YARN-9550 URL: https://issues.apache.org/jira/browse/YARN-9550 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.9.1 Reporter: Sihai Ke In hadoop 2.9.1 class *ContainersMonitorImpl* line 664, I suspect it use the wrong way to calculate the milliVcoresUsed, below is the code. {code:java} ResourceCalculatorProcessTree pTree = ptInfo.getProcessTree(); pTree.updateProcessTree();// update process-tree if (!pTree.isValidData()) { // If we cannot get the data for one container, we ignore it all LOG.error("Cannot get the data for " + pId); trackedContainersUtilization = null; continue; } long currentVmemUsage = pTree.getVirtualMemorySize(); long currentPmemUsage = pTree.getRssMemorySize(); // if machine has 6 cores and 3 are used, // cpuUsagePercentPerCore should be 300% and // cpuUsageTotalCoresPercentage should be 50% float cpuUsagePercentPerCore = pTree.getCpuUsagePercent(); if (cpuUsagePercentPerCore < 0) { // CPU usage is not available likely because the container just // started. Let us skip this turn and consider this container // in the next iteration. LOG.info("Skipping monitoring container " + containerId + " since CPU usage is not yet available."); continue; } float cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore / resourceCalculatorPlugin.getNumProcessors(); // Multiply by 1000 to avoid losing data when converting to int int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 * maxVCoresAllottedForContainers /nodeCpuPercentageForYARN); // milliPcoresUsed = (int) (cpuUsagePercentPerCore * 1000 / 100; // As cpuUsagePercentagePerCore use 100 to represent 1 single core. int milliPcoresUsed = (int) (cpuUsagePercentPerCore * 10); // as processes begin with an age 1, we want to see if there // are processes more than 1 iteration old. vcoresUsageByAllContainers += milliVcoresUsed; pcoresByAllContainers += milliPcoresUsed; {code} I think {code:java} int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 * maxVCoresAllottedForContainers /nodeCpuPercentageForYARN);{code} should be {code:java} int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 * maxVCoresAllottedForContainers; {code} I think it need not to divide nodeCpuPercentageForYARN, [~kasha], looks you add this feature, could you help to have a look ? or could you educate me if I am wrong ? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9549) Not able to run pyspark in docker driver container on Yarn3
[ https://issues.apache.org/jira/browse/YARN-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839061#comment-16839061 ] Vinod Kumar Vavilapalli commented on YARN-9549: --- This is certainly a configuration issue. Please post your yarn-site.xml. You are better off hitting the user lists first for issues like these. > Not able to run pyspark in docker driver container on Yarn3 > --- > > Key: YARN-9549 > URL: https://issues.apache.org/jira/browse/YARN-9549 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.2 > Environment: Hadoop 3.1.1.3.1.0.0-78 > spark version 2.3.2.3.1.0.0-78 > Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211 > Server: Docker Engine - Community Version: 18.09.6 >Reporter: Jack Zhu >Priority: Critical > Attachments: Dockerfile, test.py > > > I follow > [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html] > to build up a spark docker image to run pyspark, there isn't a good document > describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use > below command to launch my simple python job: > PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn > --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf > spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf > spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 > --conf > spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 > --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py > > in the test.py, it only simply collect the hostname from the executor, and > check whether the python job run in a container or not. > I found that the driver always run direct on the host, not run in the > container, as a result we need to keep python version in docker image > consistent with the nodemanager, this is meanless to use docker to package > all the dependencies. > > The spark job can be run successfully, below is the std output: > Log Type: stdout > Log Upload Time: Tue May 14 02:07:06 + 2019 > Log Length: 141 > host.test.com > False >going to print all the container names. [True, True, True, > True, True, True, True, True, True] > please see attached Dockfile and test.py > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Affects Version/s: (was: 3.1.2) (was: 2.8.5) (was: 2.9.2) (was: 3.2.0) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > a
[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839054#comment-16839054 ] Shurong Mai edited comment on YARN-9518 at 5/14/19 3:54 AM: [~Jim_Brennan], thank you very much. You are right. You said " The variable LINUX_PATH_SEPARATOR (which is {{%}}) is now used as a separator instead of comma" after release 2.8, so the problem of cgroup path with comma in this issue is not a problem after release 2.8, and I removed "2.8.5, 2.9.2, 3.1.2, 3.2.0" from affects versions and remain "2.7.7". We are running in 2.7.7 release. I said to [~jhung] "YARN-2194 looks the same problem as this issue, but it supplies another different solution." . Therefore, my patch also supplies a solution in version 2.7.7 and older version. Thank you a lot again. was (Author: shurong.mai): [~Jim_Brennan], thank you very much. " The variable LINUX_PATH_SEPARATOR (which is {{%}}) is now used as a separator instead of comma" > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_0
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839054#comment-16839054 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], thank you very much. " The variable LINUX_PATH_SEPARATOR (which is {{%}}) is now used as a separator instead of comma" > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.ut
[jira] [Created] (YARN-9549) Not able to run pyspark in docker driver container on Yarn3
Jack Zhu created YARN-9549: -- Summary: Not able to run pyspark in docker driver container on Yarn3 Key: YARN-9549 URL: https://issues.apache.org/jira/browse/YARN-9549 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.1.2 Environment: Hadoop 3.1.1.3.1.0.0-78 spark version 2.3.2.3.1.0.0-78 Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211 Server: Docker Engine - Community Version: 18.09.6 Reporter: Jack Zhu Attachments: Dockerfile, test.py I follow [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html] to build up a spark docker image to run pyspark, there isn't a good document describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use below command to launch my simple python job: PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py in the test.py, it only simply collect the hostname from the executor, and check whether the python job run in a container or not. I found that the driver always run direct on the host, not run in the container, as a result we need to keep python version in docker image consistent with the nodemanager, this is meanless to use docker to package all the dependencies. The spark job can be run successfully, below is the std output: Log Type: stdout Log Upload Time: Tue May 14 02:07:06 + 2019 Log Length: 141 host.test.com False >going to print all the container names. [True, True, True, True, True, True, True, True, True] please see attached Dockfile and test.py -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9548) [Umbrella] Make YARN work well in elastic cloud environments
Vinod Kumar Vavilapalli created YARN-9548: - Summary: [Umbrella] Make YARN work well in elastic cloud environments Key: YARN-9548 URL: https://issues.apache.org/jira/browse/YARN-9548 Project: Hadoop YARN Issue Type: New Feature Reporter: Vinod Kumar Vavilapalli YARN works well in static environments but there isn't fundamentally broken in YARN to stop us from making it work well in dynamic environments like cloud (public or private) as well. There are few areas where we need to invest though # Autoscaling -- cluster level: add/remove nodes intelligently based on metrics and/or admin plugins -- node level: scale nodes up/down vertically? # Smarter scheduling -- to pack containers as opposed to spreading them around to account for nodes going away -- to account for speculative nodes like spot instances # Handling nodes going away better -- by decommissioning sanely -- dealing with auxiliary services data # And any installation helpers in this dynamic world - scripts, operators etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS
[ https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838893#comment-16838893 ] Eric Payne commented on YARN-8625: -- [~Prabhu Joseph], I'm sorry but I can't get applications to show up in http://:/applicationhistory with NullApplicationHistoryStore set for {{yarn.timeline-service.generic-application-history.store-class}}. Because of this, curl doesn't return anything. I have tried to mimic your yarn-site.xml configuration as much as possible. > Aggregate Resource Allocation for each job is not present in ATS > > > Key: YARN-8625 > URL: https://issues.apache.org/jira/browse/YARN-8625 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, yarn-site.xml > > > Aggregate Resource Allocation shown on RM UI for finished job is very useful > metric to understand how much resource a job has consumed. But this does not > get stored in ATS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
[ https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838844#comment-16838844 ] Suma Shivaprasad commented on YARN-9519: Thanks for the patch [~adam.antal] . Patch LGTM. +1. > TFile log aggregation file format is insensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config > > > Key: YARN-9519 > URL: https://issues.apache.org/jira/browse/YARN-9519 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9519.001.patch, YARN-9519.002.patch, > YARN-9519.003.patch, YARN-9519.004.patch, YARN-9519.005.patch > > > The TFile log aggregation file format is not sensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config. > In {{LogAggregationTFileController$initInternal}}: > {code:java} > this.remoteRootLogDir = new Path( > conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); > {code} > So the remoteRootLogDir is only aware of the > yarn.nodemanager.remote-app-log-dir config, while other file format, like > IFile defaults to the file format config, so its priority is higher. > From {{LogAggregationIndexedFileController$initInternal}}: > {code:java} > String remoteDirStr = String.format( > YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, > this.fileControllerName); > String remoteDir = conf.get(remoteDirStr); > if (remoteDir == null || remoteDir.isEmpty()) { > remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); > } > {code} > (Where these configs are: ) > {code:java} > public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT > = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; > public static final String NM_REMOTE_APP_LOG_DIR = > NM_PREFIX + "remote-app-log-dir"; > {code} > I suggest TFile should try to obtain the remote dir config from > yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not > specified falls back to the yarn.nodemanager.remote-app-log-dir config. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9505) Add container allocation latency for Opportunistic Scheduler
[ https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838793#comment-16838793 ] Íñigo Goiri commented on YARN-9505: --- What about making the final variables as fields with the capitalization and so on? No need to use it in other tests for now but for future ones. > Add container allocation latency for Opportunistic Scheduler > > > Key: YARN-9505 > URL: https://issues.apache.org/jira/browse/YARN-9505 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9505.001.patch, YARN-9505.002.patch, > YARN-9505.003.patch, YARN-9505.004.patch > > > This will help in tuning the opportunistic scheduler and it's configuration > parameters. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9453) Clean up code long if-else chain in ApplicationCLI#run
[ https://issues.apache.org/jira/browse/YARN-9453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838784#comment-16838784 ] Hudson commented on YARN-9453: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16544 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16544/]) YARN-9453. Clean up code long if-else chain in ApplicationCLI#run. (gifuma: rev 206e6339469ca6d362382efbb488089ece830e98) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java > Clean up code long if-else chain in ApplicationCLI#run > -- > > Key: YARN-9453 > URL: https://issues.apache.org/jira/browse/YARN-9453 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Wanqiang Ji >Priority: Major > Labels: newbie > Fix For: 3.3.0 > > Attachments: YARN-9453.001.patch, YARN-9453.002.patch, > YARN-9453.003.patch, YARN-9453.004.patch > > > org.apache.hadoop.yarn.client.cli.ApplicationCLI#run is 630 lines long, > contains a long if-else chain and many many conditions. > As a start, the bodies of the conditions could be extracted to methods and a > more clean solution could be introduced to parse the argument values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9453) Clean up code long if-else chain in ApplicationCLI#run
[ https://issues.apache.org/jira/browse/YARN-9453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9453: --- Fix Version/s: 3.3.0 > Clean up code long if-else chain in ApplicationCLI#run > -- > > Key: YARN-9453 > URL: https://issues.apache.org/jira/browse/YARN-9453 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Wanqiang Ji >Priority: Major > Labels: newbie > Fix For: 3.3.0 > > Attachments: YARN-9453.001.patch, YARN-9453.002.patch, > YARN-9453.003.patch, YARN-9453.004.patch > > > org.apache.hadoop.yarn.client.cli.ApplicationCLI#run is 630 lines long, > contains a long if-else chain and many many conditions. > As a start, the bodies of the conditions could be extracted to methods and a > more clean solution could be introduced to parse the argument values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9453) Clean up code long if-else chain in ApplicationCLI#run
[ https://issues.apache.org/jira/browse/YARN-9453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838772#comment-16838772 ] Giovanni Matteo Fumarola commented on YARN-9453: Thanks [~jiwq] for the patch. I double checked with the help of a compare tool if you have missed something. [^YARN-9453.004.patch] committed to trunk. Thanks [~snemeth] for the help in reviewing. > Clean up code long if-else chain in ApplicationCLI#run > -- > > Key: YARN-9453 > URL: https://issues.apache.org/jira/browse/YARN-9453 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Wanqiang Ji >Priority: Major > Labels: newbie > Attachments: YARN-9453.001.patch, YARN-9453.002.patch, > YARN-9453.003.patch, YARN-9453.004.patch > > > org.apache.hadoop.yarn.client.cli.ApplicationCLI#run is 630 lines long, > contains a long if-else chain and many many conditions. > As a start, the bodies of the conditions could be extracted to methods and a > more clean solution could be introduced to parse the argument values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9493) Scheduler Page does not display the right page by query string
[ https://issues.apache.org/jira/browse/YARN-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838770#comment-16838770 ] Hudson commented on YARN-9493: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16543 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16543/]) YARN-9493. Scheduler Page does not display the right page by query (gifuma: rev 29ff7fb1400efecfb71491ac97194a229d0af8de) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/SchedulerPageUtil.java > Scheduler Page does not display the right page by query string > -- > > Key: YARN-9493 > URL: https://issues.apache.org/jira/browse/YARN-9493 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: Actual-1.png, Actual-2.png, YARN-9493.001.patch, > YARN-9493.002.patch, YARN-9493.003.patch > > > In RM when we used the Capacity Scheduler, I found some mistakes caused the > WebApp's scheduler page cannot display the right page by query string. > Some opts that can be reproduced such as: > * Directed by url like [http://rm:8088/cluster/scheduler?openQueues=Queue: > default|http://127.0.0.1:8088/cluster/scheduler?openQueues=Queue:%20default#Queue:%20root] > * Directed by url like > [http://rm:8088/cluster/scheduler?openQueues=Queue:%20default|http://127.0.0.1:8088/cluster/scheduler?openQueues=Queue:%20default#Queue:%20root] > !Actual-1.png! > Except that I found if we repeat click one child queue, the window location > display the error url. > !Actual-2.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9493) Scheduler Page does not display the right page by query string
[ https://issues.apache.org/jira/browse/YARN-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9493: --- Fix Version/s: 3.3.0 > Scheduler Page does not display the right page by query string > -- > > Key: YARN-9493 > URL: https://issues.apache.org/jira/browse/YARN-9493 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: Actual-1.png, Actual-2.png, YARN-9493.001.patch, > YARN-9493.002.patch, YARN-9493.003.patch > > > In RM when we used the Capacity Scheduler, I found some mistakes caused the > WebApp's scheduler page cannot display the right page by query string. > Some opts that can be reproduced such as: > * Directed by url like [http://rm:8088/cluster/scheduler?openQueues=Queue: > default|http://127.0.0.1:8088/cluster/scheduler?openQueues=Queue:%20default#Queue:%20root] > * Directed by url like > [http://rm:8088/cluster/scheduler?openQueues=Queue:%20default|http://127.0.0.1:8088/cluster/scheduler?openQueues=Queue:%20default#Queue:%20root] > !Actual-1.png! > Except that I found if we repeat click one child queue, the window location > display the error url. > !Actual-2.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9493) Scheduler Page does not display the right page by query string
[ https://issues.apache.org/jira/browse/YARN-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9493: --- Summary: Scheduler Page does not display the right page by query string (was: Fix Scheduler Page can't display the right page by query string) > Scheduler Page does not display the right page by query string > -- > > Key: YARN-9493 > URL: https://issues.apache.org/jira/browse/YARN-9493 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: Actual-1.png, Actual-2.png, YARN-9493.001.patch, > YARN-9493.002.patch, YARN-9493.003.patch > > > In RM when we used the Capacity Scheduler, I found some mistakes caused the > WebApp's scheduler page cannot display the right page by query string. > Some opts that can be reproduced such as: > * Directed by url like [http://rm:8088/cluster/scheduler?openQueues=Queue: > default|http://127.0.0.1:8088/cluster/scheduler?openQueues=Queue:%20default#Queue:%20root] > * Directed by url like > [http://rm:8088/cluster/scheduler?openQueues=Queue:%20default|http://127.0.0.1:8088/cluster/scheduler?openQueues=Queue:%20default#Queue:%20root] > !Actual-1.png! > Except that I found if we repeat click one child queue, the window location > display the error url. > !Actual-2.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9542) LogsCLI guessAppOwner ignores custom file format suffix
[ https://issues.apache.org/jira/browse/YARN-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838653#comment-16838653 ] Hadoop QA commented on YARN-9542: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 1s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 25s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 59s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}101m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineClientV2Impl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9542 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12968568/YARN-9542-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8b8efecab547 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1a47c2b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YAR
[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs
[ https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838621#comment-16838621 ] Hadoop QA commented on YARN-9538: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 27m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 17s{color} | {color:red} hadoop-yarn-site in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 40m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9538 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12968536/YARN-9538.001.patch | | Optional Tests | dupname asflicense mvnsite | | uname | Linux f35f9df672f3 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1a47c2b | | maven | version: Apache Maven 3.3.9 | | mvnsite | https://builds.apache.org/job/PreCommit-YARN-Build/24084/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-site.txt | | Max. process+thread count | 446 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24084/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Document scheduler/app activities and REST APIs > --- > > Key: YARN-9538 > URL: https://issues.apache.org/jira/browse/YARN-9538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9538.001.patch > > > Add documentation for scheduler/app activities in CapacityScheduler.md and > ResourceManagerRest.md. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
[ https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838609#comment-16838609 ] Adam Antal commented on YARN-9519: -- We're a bit stuck to properly configure file formats without this patch. Could you please a took a look [~suma.shivaprasad]? > TFile log aggregation file format is insensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config > > > Key: YARN-9519 > URL: https://issues.apache.org/jira/browse/YARN-9519 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9519.001.patch, YARN-9519.002.patch, > YARN-9519.003.patch, YARN-9519.004.patch, YARN-9519.005.patch > > > The TFile log aggregation file format is not sensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config. > In {{LogAggregationTFileController$initInternal}}: > {code:java} > this.remoteRootLogDir = new Path( > conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); > {code} > So the remoteRootLogDir is only aware of the > yarn.nodemanager.remote-app-log-dir config, while other file format, like > IFile defaults to the file format config, so its priority is higher. > From {{LogAggregationIndexedFileController$initInternal}}: > {code:java} > String remoteDirStr = String.format( > YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, > this.fileControllerName); > String remoteDir = conf.get(remoteDirStr); > if (remoteDir == null || remoteDir.isEmpty()) { > remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); > } > {code} > (Where these configs are: ) > {code:java} > public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT > = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; > public static final String NM_REMOTE_APP_LOG_DIR = > NM_PREFIX + "remote-app-log-dir"; > {code} > I suggest TFile should try to obtain the remote dir config from > yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not > specified falls back to the yarn.nodemanager.remote-app-log-dir config. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9542) LogsCLI guessAppOwner ignores custom file format suffix
[ https://issues.apache.org/jira/browse/YARN-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838564#comment-16838564 ] Prabhu Joseph commented on YARN-9542: - Tested with both FileControllers ifile and tfile, log files from old and new application log directory structure and custom log dir suffix. Below are the functional testcases # Run job as user1. Stop RM. Fetch application logs as user1 # Fetch application logs as user2 who has access to user1 app log dir. # Fetch application logs as user3 who does not have access. > LogsCLI guessAppOwner ignores custom file format suffix > --- > > Key: YARN-9542 > URL: https://issues.apache.org/jira/browse/YARN-9542 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-9542-001.patch > > > LogsCLI guessAppOwner ignores custom file format suffix > yarn.log-aggregation.%s.remote-app-log-dir-suffix / Default > IndexedFileController Suffix > ({yarn.nodemanager.remote-app-log-dir-suffix}-ifile or logs-ifile). It > considers only yarn.nodemanager.remote-app-log-dir-suffix or default logs. > *Repro:* > {code} > yarn-site.xml > yarn.log-aggregation.file-formats ifile > yarn.log-aggregation.file-controller.ifile.class > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController > yarn.log-aggregation.ifile.remote-app-log-dir app-logs > yarn.resourcemanager.connect.max-wait.ms 1000 > core-site.xml: > ipc.client.connect.max.retries 3 > ipc.client.connect.retry.interval 10 > Run a Job with above configs and Stop the RM. > [ambari-qa@yarn-ats-1 ~]$ yarn logs -applicationId > application_1557482389195_0001 > 2019-05-10 10:03:58,215 INFO client.RMProxy: Connecting to ResourceManager at > yarn-ats-1/172.26.81.91:8050 > Unable to get ApplicationState. Attempting to fetch logs directly from the > filesystem. > Can not find the appOwner. Please specify the correct appOwner > Could not locate application logs for application_1557482389195_0001 > [ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls > /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001 > Found 1 items > -rw-r- 3 ambari-qa supergroup 18058 2019-05-10 10:01 > /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001/yarn-ats-1_45454 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9542) LogsCLI guessAppOwner ignores custom file format suffix
[ https://issues.apache.org/jira/browse/YARN-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9542: Attachment: YARN-9542-001.patch > LogsCLI guessAppOwner ignores custom file format suffix > --- > > Key: YARN-9542 > URL: https://issues.apache.org/jira/browse/YARN-9542 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-9542-001.patch > > > LogsCLI guessAppOwner ignores custom file format suffix > yarn.log-aggregation.%s.remote-app-log-dir-suffix / Default > IndexedFileController Suffix > ({yarn.nodemanager.remote-app-log-dir-suffix}-ifile or logs-ifile). It > considers only yarn.nodemanager.remote-app-log-dir-suffix or default logs. > *Repro:* > {code} > yarn-site.xml > yarn.log-aggregation.file-formats ifile > yarn.log-aggregation.file-controller.ifile.class > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController > yarn.log-aggregation.ifile.remote-app-log-dir app-logs > yarn.resourcemanager.connect.max-wait.ms 1000 > core-site.xml: > ipc.client.connect.max.retries 3 > ipc.client.connect.retry.interval 10 > Run a Job with above configs and Stop the RM. > [ambari-qa@yarn-ats-1 ~]$ yarn logs -applicationId > application_1557482389195_0001 > 2019-05-10 10:03:58,215 INFO client.RMProxy: Connecting to ResourceManager at > yarn-ats-1/172.26.81.91:8050 > Unable to get ApplicationState. Attempting to fetch logs directly from the > filesystem. > Can not find the appOwner. Please specify the correct appOwner > Could not locate application logs for application_1557482389195_0001 > [ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls > /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001 > Found 1 items > -rw-r- 3 ambari-qa supergroup 18058 2019-05-10 10:01 > /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001/yarn-ats-1_45454 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder
[ https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838532#comment-16838532 ] Adam Antal commented on YARN-9525: -- Thanks for the input [~ste...@apache.org]. When an aggregation cycle ends, the FileFormatController closes the stream and the file is visible in HDFS, so it will is forced to be closed by the logic above this - still it can be investigated further. CC [~pbacsko] who might have more time to take a look into this. > IFile format is not working against s3a remote folder > - > > Key: YARN-9525 > URL: https://issues.apache.org/jira/browse/YARN-9525 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.2 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} > configured to an s3a URI throws the following exception during log > aggregation: > {noformat} > Cannot create writer for app application_1556199768861_0001. Skip log upload > this time. > java.io.IOException: java.io.FileNotFoundException: No such file or > directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: No such file or directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195) > ... 7 more > {noformat} > This stack trace point to > {{LogAggregationIndexedFileController$initializeWriter}} where we do the > following steps (in a non-rolling log aggregation setup): > - create FSDataOutputStream > - writing out a UUID > - flushing > - immediately after that we call a GetFileStatus to get the length of the log > file (the bytes we just wrote out), and that's where the failures happens: > the file is not there yet due to eventual consistency. > Maybe we can get rid of that, so we can use IFile format against a s3a target. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9547) ContainerStatusPBImpl default execution type is not returned
[ https://issues.apache.org/jira/browse/YARN-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T reassigned YARN-9547: --- Assignee: Bilwa S T > ContainerStatusPBImpl default execution type is not returned > > > Key: YARN-9547 > URL: https://issues.apache.org/jira/browse/YARN-9547 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > > {code} > @Override > public synchronized ExecutionType getExecutionType() { > ContainerStatusProtoOrBuilder p = viaProto ? proto : builder; > if (!p.hasExecutionType()) { > return null; > } > return convertFromProtoFormat(p.getExecutionType()); > } > {code} > ContainerStatusPBImpl executionType should return default as > ExecutionType.GUARANTEED. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9547) ContainerStatusPBImpl default execution type is not returned
Bibin A Chundatt created YARN-9547: -- Summary: ContainerStatusPBImpl default execution type is not returned Key: YARN-9547 URL: https://issues.apache.org/jira/browse/YARN-9547 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt {code} @Override public synchronized ExecutionType getExecutionType() { ContainerStatusProtoOrBuilder p = viaProto ? proto : builder; if (!p.hasExecutionType()) { return null; } return convertFromProtoFormat(p.getExecutionType()); } {code} ContainerStatusPBImpl executionType should return default as ExecutionType.GUARANTEED. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9525) IFile format is not working against s3a remote folder
[ https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9525: - Summary: IFile format is not working against s3a remote folder (was: TFile format is not working against s3a remote folder) > IFile format is not working against s3a remote folder > - > > Key: YARN-9525 > URL: https://issues.apache.org/jira/browse/YARN-9525 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.2 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} > configured to an s3a URI throws the following exception during log > aggregation: > {noformat} > Cannot create writer for app application_1556199768861_0001. Skip log upload > this time. > java.io.IOException: java.io.FileNotFoundException: No such file or > directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: No such file or directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195) > ... 7 more > {noformat} > This stack trace point to > {{LogAggregationIndexedFileController$initializeWriter}} where we do the > following steps (in a non-rolling log aggregation setup): > - create FSDataOutputStream > - writing out a UUID > - flushing > - immediately after that we call a GetFileStatus to get the length of the log > file (the bytes we just wrote out), and that's where the failures happens: > the file is not there yet due to eventual consistency. > Maybe we can get rid of that, so we can use IFile format against a s3a target. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9538) Document scheduler/app activities and REST APIs
[ https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9538: --- Attachment: YARN-9538.001.patch > Document scheduler/app activities and REST APIs > --- > > Key: YARN-9538 > URL: https://issues.apache.org/jira/browse/YARN-9538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9538.001.patch > > > Add documentation for scheduler/app activities in CapacityScheduler.md and > ResourceManagerRest.md. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9497) Support grouping by diagnostics for query results of scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838400#comment-16838400 ] Hadoop QA commented on YARN-9497: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 0s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m 48s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 43s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}144m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9497 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12968522/YARN-9497.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a5a2878d3ecf 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1a47c2b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24082/testReport/ | | Max. process+thread count | 904
[jira] [Commented] (YARN-9546) Add configuration option for yarn services AM classpath
[ https://issues.apache.org/jira/browse/YARN-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838348#comment-16838348 ] Szilard Nemeth commented on YARN-9546: -- Hi [~shuzirra]! Could you please add some test coverage for the new parameter as discussed offline? Thanks! > Add configuration option for yarn services AM classpath > --- > > Key: YARN-9546 > URL: https://issues.apache.org/jira/browse/YARN-9546 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-9546.001.patch > > > For regular containers we have the yarn.application.classpath property, which > allows users to add extra elements to the container's classpath. > However yarn services deliberately ignores this property to avoid > incompatible class collision. However on systems where the configuration > files for containers are located other than the path stored in > $HADOOP_CONF_DIR, there is no way to modify the classpath to include other > directories with the actual configuration. > Suggestion let's create a new property which allows us to add extra elements > to the classpath generated for YARN service AM containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org