[jira] [Created] (YARN-5679) TestAHSWebServices is failing
Akira Ajisaka created YARN-5679: --- Summary: TestAHSWebServices is failing Key: YARN-5679 URL: https://issues.apache.org/jira/browse/YARN-5679 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Akira Ajisaka TestAHSWebServices.testContainerLogsForFinishedApps is failing. https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/testReport/ {noformat} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices.createContainerLogInLocalDir(TestAHSWebServices.java:675) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices.testContainerLogsForFinishedApps(TestAHSWebServices.java:581) {noformat} {noformat} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices.testContainerLogsForFinishedApps(TestAHSWebServices.java:519) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5678) Misleading log message in FSLeafQueue
Yufei Gu created YARN-5678: -- Summary: Misleading log message in FSLeafQueue Key: YARN-5678 URL: https://issues.apache.org/jira/browse/YARN-5678 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 3.0.0-alpha1 Reporter: Yufei Gu Assignee: Yufei Gu {code} private void updateDemandForApp(FSAppAttempt sched, Resource maxRes) { sched.updateDemand(); Resource toAdd = sched.getDemand(); if (LOG.isDebugEnabled()) { LOG.debug("Counting resource from " + sched.getName() + " " + toAdd + "; Total resource consumption for " + getName() + " now " + demand); } demand = Resources.add(demand, toAdd); demand = Resources.componentwiseMin(demand, maxRes); } {code} Change the "resource consumption" to "resource demand". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5677) RM can be in active-active state for an extended period
Daniel Templeton created YARN-5677: -- Summary: RM can be in active-active state for an extended period Key: YARN-5677 URL: https://issues.apache.org/jira/browse/YARN-5677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0-alpha1, 2.7.3 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Critical Both branch-2.8/trunk and branch-2.7 have issues when the active RM loses contact with the ZK node(s). In branch-2.7, the RM will retry the connection 1000 times by default. Attempting to contact a node which cannot be reached is slow, which means the active can take over an hour to realize it is no longer active. I clocked it at about an hour and a half in my tests. The solution appears to be to add some time awareness into the retry loop. In branch-2.8/trunk, there is no maximum number of retries that I see. It appears the connection will be retried forever, with the active never figuring out it's no longer active. I have a test running, and I'll update this description with empirical findings when I'm done. The solution appears to be to cap the number of retries or amount of time spent retrying. This issue is significant because of the asynchronous nature of job submission. If the active doesn't know it's not active, it will buffer up job submissions until it finally realizes it has become the standby. Then it will fail all the job submissions in bulk. In high-volume workflows, that behavior can create huge mass job failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5676) Add a HashBasedRouterPolicy, that routes jobs based on queue name hash.
Carlo Curino created YARN-5676: -- Summary: Add a HashBasedRouterPolicy, that routes jobs based on queue name hash. Key: YARN-5676 URL: https://issues.apache.org/jira/browse/YARN-5676 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: YARN-2915 Reporter: Carlo Curino Assignee: Carlo Curino -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/ppc64le
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/ [Sep 26, 2016 12:42:22 PM] (kai.zheng) HADOOP-13584. hdoop-aliyun: merge HADOOP-12756 branch back. [Sep 26, 2016 6:00:01 AM] (aajisaka) YARN-5663. Small refactor in ZKRMStateStore. Contributed by Oleksii -1 overall The following subsystems voted -1: compile unit The following subsystems voted -1 but were configured to be filtered/ignored: cc javac The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.hdfs.TestBlockStoragePolicy hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM hadoop.yarn.server.nodemanager.recovery.TestNMLeveldbStateStoreService hadoop.yarn.server.nodemanager.TestNodeManagerShutdown hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager hadoop.yarn.server.timeline.TestRollingLevelDB hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices hadoop.yarn.server.timeline.TestTimelineDataManager hadoop.yarn.server.timeline.TestLeveldbTimelineStore hadoop.yarn.server.timeline.recovery.TestLeveldbTimelineStateStore hadoop.yarn.server.timeline.TestRollingLevelDBTimelineStore hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer hadoop.yarn.server.timelineservice.storage.common.TestRowKeys hadoop.yarn.server.timelineservice.storage.common.TestKeyConverters hadoop.yarn.server.timelineservice.storage.common.TestSeparator hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore hadoop.yarn.server.resourcemanager.TestRMRestart hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing hadoop.yarn.server.resourcemanager.TestResourceTrackerService hadoop.yarn.server.TestMiniYarnClusterNodeUtilization hadoop.yarn.server.TestContainerManagerSecurity hadoop.yarn.server.timeline.TestLevelDBCacheTimelineStore hadoop.yarn.server.timeline.TestOverrideTimelineStoreYarnClient hadoop.yarn.server.timeline.TestEntityGroupFSTimelineStore hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorage hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun hadoop.yarn.server.timelineservice.storage.TestPhoenixOfflineAggregationWriterImpl hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.mapred.TestShuffleHandler hadoop.mapreduce.v2.hs.TestHistoryServerLeveldbStateStoreService hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM Timed out junit tests : org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache compile: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-compile-root.txt [308K] cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-compile-root.txt [308K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-compile-root.txt [308K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [196K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [56K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt [56K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice.txt [20K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt [76K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/106/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt [268K] https://builds.apache.org/job/hadoop-qbt-trunk-ja
[jira] [Created] (YARN-5675) Checkin swagger definition in the repo
Gour Saha created YARN-5675: --- Summary: Checkin swagger definition in the repo Key: YARN-5675 URL: https://issues.apache.org/jira/browse/YARN-5675 Project: Hadoop YARN Issue Type: Sub-task Reporter: Gour Saha This task will be used to submit the REST API swagger definition (yaml format) to be checked in to the repo -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/ [Sep 25, 2016 12:06:30 PM] (naganarasimha_gr) YARN-3877. YarnClientImpl.submitApplication swallows exceptions. [Sep 26, 2016 12:42:22 PM] (kai.zheng) HADOOP-13584. hdoop-aliyun: merge HADOOP-12756 branch back. [Sep 26, 2016 6:00:01 AM] (aajisaka) YARN-5663. Small refactor in ZKRMStateStore. Contributed by Oleksii -1 overall The following subsystems voted -1: asflicense unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices hadoop.yarn.server.TestMiniYarnClusterNodeUtilization hadoop.yarn.server.TestContainerManagerSecurity cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/diff-compile-javac-root.txt [172K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/diff-checkstyle-root.txt [16M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/diff-patch-pylint.txt [16K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/diff-patch-shelldocs.txt [16K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/whitespace-eol.txt [11M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/diff-javadoc-javadoc-root.txt [2.2M] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt [268K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-nativetask.txt [120K] asflicense: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/176/artifact/out/patch-asflicense-problems.txt [4.0K] Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5674) FairScheduler handles "dots" in user names inconsistently in the config
Wilfred Spiegelenburg created YARN-5674: --- Summary: FairScheduler handles "dots" in user names inconsistently in the config Key: YARN-5674 URL: https://issues.apache.org/jira/browse/YARN-5674 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg A user name can contain a dot because it could be used as the queue name we replace the dot with a defined separator. When defining queues in the configuration for users containing a dot we expect that the dot is replaced by the "\_dot\_" string. In the user limits we do not do that and user limits need a normal dot in the user name. This is confusing when you create a scheduler configuration in some places you need to replace in others you do not. This can cause issue when user limits are not enforced as expected. We should use one way to specify the user and since the queue naming can not be changed we should also use the same "\_dot\_" in the user limits and enforce correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5673) [Umbrella] Re-write container-executor to improve security, extensibility, and portability
Varun Vasudev created YARN-5673: --- Summary: [Umbrella] Re-write container-executor to improve security, extensibility, and portability Key: YARN-5673 URL: https://issues.apache.org/jira/browse/YARN-5673 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev As YARN adds support for new features that require administrator privileges(such as support for network throttling and docker), we’ve had to add new capabilities to the container-executor. This has led to a recognition that the current container-executor security features as well as the code could be improved. The current code is fragile and it’s hard to add new features without causing regressions. Some of the improvements that need to be made are - *Security* Currently the container-executor has limited security features. It relies primarily on the permissions set on the binary but does little additional security beyond that. There are few outstanding issues today - - No audit log - No way to disable features - network throttling and docker support are built in and there’s no way to turn them off at a container-executor level - Code can be improved - a lot of the code switches users back and forth in an arbitrary manner - No input validation - the paths, and files provided at invocation are not validated or required to be in some specific location - No signing functionality - there is no way to enforce that the binary was invoked by the NM and not by any other process *Code Issues* The code layout and implementation themselves can be improved. Some issues there are - - No support for log levels - everything is logged and this can’t be turned on or off - Extremely long set of invocation parameters(specifically during container launch) which makes turning features on or off complicated - Poor test coverage - it’s easy to introduce regressions today due to the lack of a proper test setup - Duplicate functionality - there is some amount of code duplication - Hard to make improvements or add new features due to the issues raised above *Portability* - The container-executor mixes platform dependent APIs with platform independent APIs making it hard to run it on multiple platforms. Allowing it to run on multiple platforms also improves the overall code structure . One option is to improve the existing container-executor, however it might be easier to start from scratch. That allows existing functionality to be supported until we are ready to switch to the new code. This umbrella JIRA is to capture all the work required for the new code. I'm going to work on a design doc for the changes - any suggestions or improvements are welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5672) FairScheduler: wrong queue name in log when adding application
Wilfred Spiegelenburg created YARN-5672: --- Summary: FairScheduler: wrong queue name in log when adding application Key: YARN-5672 URL: https://issues.apache.org/jira/browse/YARN-5672 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Priority: Minor The FairScheduler logs the passed in queue name when adding an application instead of the queue returned by the policy. Later log entries show the correct info: {code} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1471982804173_6181 from user: wilfred, in queue: default, currently num of applications: 1 ... INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1471982804173_6181,name=oozie:launcher:XXX,user=wilfred,queue=root.wilfred,state=FAILED,trackingUrl=https://10.10.10.10:8088/cluster/app/application_1471982804173_6181,appMasterHost=N/A,startTime=1473580802079,finishTime=1473580809148,finalStatus=FAILED {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5671) Add support for Docker image clean up
[ https://issues.apache.org/jira/browse/YARN-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-5671. Resolution: Duplicate > Add support for Docker image clean up > - > > Key: YARN-5671 > URL: https://issues.apache.org/jira/browse/YARN-5671 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang > > Regarding to Docker image localization, we also need a way to clean up the > old/stale Docker image to save storage space. We may extend deletion service > to utilize "docker rm" to do this. > This is related to YARN-3854 and may depend on its implementation. Please > refer to YARN-3854 for Docker image localization details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5671) Add support for Docker image clean up
Zhankun Tang created YARN-5671: -- Summary: Add support for Docker image clean up Key: YARN-5671 URL: https://issues.apache.org/jira/browse/YARN-5671 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Zhankun Tang Regarding to Docker image localization, we also need a way to clean up the old/stale Docker image to save storage space. We may extend deletion service to utilize "docker rm" to do this. This is related to YARN-3854 and may depend on its implementation. Please refer to YARN-3854 for Docker image localization details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5670) Add support for Docker image clean up
Zhankun Tang created YARN-5670: -- Summary: Add support for Docker image clean up Key: YARN-5670 URL: https://issues.apache.org/jira/browse/YARN-5670 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Zhankun Tang Regarding to Docker image localization, we also need a way to clean up the old/stale Docker image to save storage space. We may extend deletion service to utilize "docker rm" to do this. This is related to YARN-3854 and may depend on its implementation. Please refer to YARN-3854 for Docker image localization details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5669) Add support for Docker pull
Zhankun Tang created YARN-5669: -- Summary: Add support for Docker pull Key: YARN-5669 URL: https://issues.apache.org/jira/browse/YARN-5669 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Zhankun Tang We need to add docker pull to support Docker image localization. Refer to YARN-3854 for the details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5668) content-type: application/json support in /conf endpoint
Sreenath Somarajapuram created YARN-5668: Summary: content-type: application/json support in /conf endpoint Key: YARN-5668 URL: https://issues.apache.org/jira/browse/YARN-5668 Project: Hadoop YARN Issue Type: Improvement Reporter: Sreenath Somarajapuram Right now all requests send to /conf is returned with xml data. Would be great if the endpoint checks the request headers, and return data in JSON format for content-type: application/json. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org