[jira] [Commented] (YARN-10002) Code cleanup and improvements in ConfigurationStoreBaseTest
[ https://issues.apache.org/jira/browse/YARN-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080250#comment-17080250 ] Hadoop QA commented on YARN-10002: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 27s{color} | {color:green} branch-3.2 passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 18s{color} | {color:red} hadoop-yarn-server-resourcemanager in branch-3.2 failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} branch-3.2 passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 2m 31s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 37s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 17 new + 0 unchanged - 0 fixed = 17 total (was 0) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 4 new + 2 unchanged - 4 fixed = 6 total (was 6) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}391m 25s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}455m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue | | | hadoop.yarn.server.resourcemanager.scheduler.policy.TestFairOrderingPolicy | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestSchedulingRequestContainerAllocation | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerWithMultiResourceTypes | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | |
[jira] [Commented] (YARN-10223) Duplicate jersey-test-framework-core dependency in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080158#comment-17080158 ] Akira Ajisaka commented on YARN-10223: -- It's not critical, so I targeted this to 3.3.1. > Duplicate jersey-test-framework-core dependency in yarn-server-common > - > > Key: YARN-10223 > URL: https://issues.apache.org/jira/browse/YARN-10223 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Minor > > The following warning appears in maven log. > {noformat} > [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must > be unique: > com.sun.jersey.jersey-test-framework:jersey-test-framework-core:jar -> > version (?) vs 1.19 @ line 148, column 17 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7558) "yarn logs" command fails to get logs for running containers if UI authentication is enabled.
[ https://issues.apache.org/jira/browse/YARN-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated YARN-7558: - Reporter: Namit Maheshwari (was: Namit Maheshwari) > "yarn logs" command fails to get logs for running containers if UI > authentication is enabled. > - > > Key: YARN-7558 > URL: https://issues.apache.org/jira/browse/YARN-7558 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Xuan Gong >Priority: Critical > Fix For: 3.1.0, 2.9.1, 3.0.1 > > Attachments: YARN-7558.1.patch, YARN-7558.2.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10002) Code cleanup and improvements in ConfigurationStoreBaseTest
[ https://issues.apache.org/jira/browse/YARN-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10002: - Attachment: YARN-10002.branch-3.2.001.patch > Code cleanup and improvements in ConfigurationStoreBaseTest > --- > > Key: YARN-10002 > URL: https://issues.apache.org/jira/browse/YARN-10002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-10002.001.patch, YARN-10002.002.patch, > YARN-10002.003.patch, YARN-10002.004.patch, YARN-10002.005.patch, > YARN-10002.006.patch, YARN-10002.branch-3.2.001.patch > > > * Some protected fields could be package-private > * Could add a helper method that prepares a simple LogMutation with 1, 2 or 3 > updates (Key + value) as this pattern is used extensively in subclasses -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5625) FairScheduler should use FSContext more aggressively to avoid constructors with many parameters
[ https://issues.apache.org/jira/browse/YARN-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5625: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > FairScheduler should use FSContext more aggressively to avoid constructors > with many parameters > --- > > Key: YARN-5625 > URL: https://issues.apache.org/jira/browse/YARN-5625 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Priority: Major > > YARN-5609 introduces FSContext, a structure to capture basic FairScheduler > information. In addition to preemption details, it could host references to > the scheduler, QueueManager, AllocationConfiguration etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4843) [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to int64
[ https://issues.apache.org/jira/browse/YARN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4843: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to > int64 > - > > Key: YARN-4843 > URL: https://issues.apache.org/jira/browse/YARN-4843 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 3.0.0-alpha1 >Reporter: Wangda Tan >Priority: Major > > This JIRA is to track all int32 usages in YARN's ProtocolBuffer APIs that we > possibly need to update to int64. > One example is resource API. We use int32 for memory now, if a cluster has > 10k nodes, each node has 210G memory, we will get a negative total cluster > memory. > We may have other fields may need to upgrade from int32 to int64. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior
[ https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5465: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Server-Side NM Graceful Decommissioning subsequent call behavior > > > Key: YARN-5465 > URL: https://issues.apache.org/jira/browse/YARN-5465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Robert Kanter >Priority: Major > > The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has > the following behavior when subsequent calls are made: > # Start a long-running job that has containers running on nodeA > # Add nodeA to the exclude file > # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA > # Wait 30 seconds > # Add nodeB to the exclude file > # Run {{-refreshNodes -g 30 -server}} (30sec) > # After 30 seconds, both nodeA and nodeB shut down > In a nutshell, issuing a subsequent call to gracefully decommission nodes > updates the timeout for any currently decommissioning nodes. This makes it > impossible to gracefully decommission different sets of nodes with different > timeouts. Though it does let you easily update the timeout of currently > decommissioning nodes. > Another behavior we could do is this: > # {color:grey}Start a long-running job that has containers running on nodeA > # {color:grey}Add nodeA to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA{color} > # {color:grey}Wait 30 seconds{color} > # {color:grey}Add nodeB to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color} > # After 30 seconds, nodeB shuts down > # After 60 more seconds, nodeA shuts down > This keeps the nodes affected by each call to gracefully decommission nodes > independent. You can now have different sets of decommissioning nodes with > different timeouts. However, to update the timeout of a currently > decommissioning node, you'd have to first recommission it, and then > decommission it again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4637) AM launching blacklist purge mechanism (time based)
[ https://issues.apache.org/jira/browse/YARN-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4637: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > AM launching blacklist purge mechanism (time based) > --- > > Key: YARN-4637 > URL: https://issues.apache.org/jira/browse/YARN-4637 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Sunil G >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5414) Integrate NodeQueueLoadMonitor with ClusterNodeTracker
[ https://issues.apache.org/jira/browse/YARN-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5414: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Integrate NodeQueueLoadMonitor with ClusterNodeTracker > -- > > Key: YARN-5414 > URL: https://issues.apache.org/jira/browse/YARN-5414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: container-queuing, distributed-scheduling, scheduler >Reporter: Arun Suresh >Assignee: Abhishek Modi >Priority: Major > > The {{ClusterNodeTracker}} tracks the states of clusterNodes and provides > convenience methods like sort and filter. > The {{NodeQueueLoadMonitor}} should use the {{ClusterNodeTracker}} instead of > maintaining its own data-structure of node information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4944) Handle lack of ResourceCalculatorPlugin gracefully
[ https://issues.apache.org/jira/browse/YARN-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4944: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Handle lack of ResourceCalculatorPlugin gracefully > -- > > Key: YARN-4944 > URL: https://issues.apache.org/jira/browse/YARN-4944 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Priority: Major > Labels: newbie++ > > On some systems (e.g. mac), the NM might not be able to instantiate a > ResourceCalculatorPlugin and leads to logging a bunch of error messages. We > could improve the way we handle this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5536) Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout
[ https://issues.apache.org/jira/browse/YARN-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5536: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Multiple format support (JSON, etc.) for exclude node file in NM graceful > decommission with timeout > --- > > Key: YARN-5536 > URL: https://issues.apache.org/jira/browse/YARN-5536 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Priority: Major > > Per discussion in YARN-4676, we agree that multiple format (other than xml) > should be supported to decommission nodes with timeout values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-1426: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > YARN Components need to unregister their beans upon shutdown > > > Key: YARN-1426 > URL: https://issues.apache.org/jira/browse/YARN-1426 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.3.0, 3.0.0-alpha1 >Reporter: Jonathan Turner Eagles >Assignee: Jonathan Turner Eagles >Priority: Major > Labels: oct16-easy > Attachments: YARN-1426.2.patch, YARN-1426.patch, YARN-1426.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4953: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Delete completed container log folder when rolling log aggregation is enabled > - > > Key: YARN-4953 > URL: https://issues.apache.org/jira/browse/YARN-4953 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > > There would be potential bottle neck when cluster is running with very large > number of containers on the same NodeManager for single application. The > linux limits the subfolders count to 32K. If number of containers is greater > than 32K for an application, there would be container launch failure. At this > point of time, there are no more containers can be launched in this node. > Currently log folders are deleted after app is finished. Rolling log > aggregation aggregates logs to hdfs periodically. > I think if aggregation is completed for finished containers, then clean up > can be done i.e deleting log folder for finished containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9883) Reshape SchedulerHealth class
[ https://issues.apache.org/jira/browse/YARN-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-9883: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Reshape SchedulerHealth class > - > > Key: YARN-9883 > URL: https://issues.apache.org/jira/browse/YARN-9883 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Kinga Marton >Priority: Minor > > The {{SchedulerHealth}} class has some flaws, for example: > - It has no javadoc at all > - All its objects are package-private: they should be private > - The internal maps should be (Concurrent) EnumMaps instead of HashMaps: they > are more efficient in storing Enums > - schedulerHealthDetails only stores the last operation, its name should > reflect that (just like lastSchedulerRunDetails) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.
[ https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-2024: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > IOException in AppLogAggregatorImpl does not give stacktrace and leaves > aggregated TFile in a bad state. > > > Key: YARN-2024 > URL: https://issues.apache.org/jira/browse/YARN-2024 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Affects Versions: 0.23.10, 2.4.0 >Reporter: Eric Payne >Assignee: Xuan Gong >Priority: Major > > Multiple issues were encountered when AppLogAggregatorImpl encountered an > IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating > yarn-logs for an application that had very large (>150G each) error logs. > - An IOException was encountered during the LogWriter#append call, and a > message was printed, but no stacktrace was provided. Message: "ERROR: > Couldn't upload logs for container_n_nnn_nn_nn. Skipping > this container." > - After the IOExceptin, the TFile is in a bad state, so subsequent calls to > LogWriter#append fail with the following stacktrace: > 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LogAggregationService #17907,5,main] threw an Exception. > java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE > at > org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164) > ... > - At this point, the yarn-logs cleaner still thinks the thread is > aggregating, so the huge yarn-logs never get cleaned up for that application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4969) Fix more loggings in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4969: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Fix more loggings in CapacityScheduler > -- > > Key: YARN-4969 > URL: https://issues.apache.org/jira/browse/YARN-4969 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Labels: oct16-easy > Attachments: YARN-4969.1.patch > > > YARN-3966 did logging cleanup for Capacity Scheduler before, however, > there're some loggings we need to improvement: > Container allocation / complete / reservation / un-reserve messages for every > hierarchy (app/leaf/parent-queue) should be printed at INFO level: > I'm debugging one issue that root queue's resource usage could be negative, > it is very hard to reproduce, so we cannot enable debug logging since RM > start, size of log cannot be fit in a single disk. > Existing CS prints INFO message when container cannot be allocated, such as > re-reservation / node heartbeat, etc. we should avoid printing such message > at INFO level. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10032) Implement regex querying of logs
[ https://issues.apache.org/jira/browse/YARN-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10032: Target Version/s: 3.4.0 Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Implement regex querying of logs > > > Key: YARN-10032 > URL: https://issues.apache.org/jira/browse/YARN-10032 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > > After YARN-10031, we have query parameters to the log servlet's GET endpoint. > To demonstrate the new capabilities of the log servlet and how easy it will > be to add a functionality to all log servlets at the same time: let's add the > ability to search in the aggregated logs with a given regex. > A conceptual use case: > User run several MR jobs daily, but some of them fail to localize a > particular resource at first. We want to search in the logs of these Yarn > applications, and extract some data from them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2684) FairScheduler: When failing an application due to changes in queue config or placement policy, indicate the cause.
[ https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-2684: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > FairScheduler: When failing an application due to changes in queue config or > placement policy, indicate the cause. > -- > > Key: YARN-2684 > URL: https://issues.apache.org/jira/browse/YARN-2684 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Priority: Major > Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch > > > YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1946) need Public interface for WebAppUtils.getProxyHostAndPort
[ https://issues.apache.org/jira/browse/YARN-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-1946: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > need Public interface for WebAppUtils.getProxyHostAndPort > - > > Key: YARN-1946 > URL: https://issues.apache.org/jira/browse/YARN-1946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, webapp >Affects Versions: 2.4.0 >Reporter: Thomas Graves >Priority: Major > > ApplicationMasters are supposed to go through the ResourceManager web app > proxy if they have web UI's so they are properly secured. There is currently > no public interface for Application Masters to conveniently get the proxy > host and port. There is a function in WebAppUtils, but that class is > private. > We should provide this as a utility since any properly written AM will need > to do this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4638) Node whitelist support for AM launching
[ https://issues.apache.org/jira/browse/YARN-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4638: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Node whitelist support for AM launching > > > Key: YARN-4638 > URL: https://issues.apache.org/jira/browse/YARN-4638 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4636) Make blacklist tracking policy pluggable for more extensions.
[ https://issues.apache.org/jira/browse/YARN-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4636: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Make blacklist tracking policy pluggable for more extensions. > - > > Key: YARN-4636 > URL: https://issues.apache.org/jira/browse/YARN-4636 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Sunil G >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters
[ https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4971: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > RM fails to re-bind to wildcard IP after failover in multi homed clusters > - > > Key: YARN-4971 > URL: https://issues.apache.org/jira/browse/YARN-4971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-4971.1.patch > > > If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first > time the service becomes active binding to the wildcard works as expected. If > the service has transitioned from active to standby and then becomes active > again after failovers the service only binds to one of the ip addresses. > There is a difference between the services inside the RM: it only seem to > happen for the services listening on ports: 8030 and 8032 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4808) SchedulerNode can use a few more cosmetic changes
[ https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4808: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > SchedulerNode can use a few more cosmetic changes > - > > Key: YARN-4808 > URL: https://issues.apache.org/jira/browse/YARN-4808 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Major > Attachments: yarn-4808-1.patch, yarn-4808-2.patch > > > We have made some cosmetic changes to SchedulerNode recently. While working > on YARN-4511, realized we could improve it a little more: > # Remove volatile variables - don't see the need for them being volatile > # Some methods end up doing very similar things, so consolidating them > # Renaming totalResource to capacity. YARN-4511 plans to add inflatedCapacity > to include the un-utilized resources, and having two totals can be a little > confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
[ https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-2014: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9 > > > Key: YARN-2014 > URL: https://issues.apache.org/jira/browse/YARN-2014 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: patrick white >Assignee: Jason Darrell Lowe >Priority: Major > > Performance comparison benchmarks from 2.x against 0.23 shows AM scalability > benchmark's runtime is approximately 10% slower in 2.4.0. The trend is > consistent across later releases in both lines, latest release numbers are: > 2.4.0.0 runtime 255.6 seconds (avg 5 passes) > 0.23.9.12 runtime 230.4 seconds (avg 5 passes) > Diff: -9.9% > AM Scalability test is essentially a sleep job that measures time to launch > and complete a large number of mappers. > The diff is consistent and has been reproduced in both a larger (350 node, > 100,000 mappers) perf environment, as well as a small (10 node, 2,900 > mappers) demo cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7578) Extend TestDiskFailures.waitForDiskHealthCheck() sleeping time.
[ https://issues.apache.org/jira/browse/YARN-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7578: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Extend TestDiskFailures.waitForDiskHealthCheck() sleeping time. > --- > > Key: YARN-7578 > URL: https://issues.apache.org/jira/browse/YARN-7578 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.1.0 > Environment: ARMv8 AArch64, Ubuntu16.04 >Reporter: Guangming Zhang >Priority: Minor > Labels: dtest, patch, test > Attachments: YARN-7578.0.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Thread.sleep() function is called to wait for NodeManager to identify disk > failures. But in some cases, for example the lower-end hardware computer, the > sleep time is too short so that the NodeManager may haven't finished > identifying disk failures. This will occur test errors: > {code:java} > Running org.apache.hadoop.yarn.server.TestDiskFailures > Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 17.686 > sec <<< FAILURE! - in org.apache.hadoop.yarn.server.TestDiskFailures > testLocalDirsFailures(org.apache.hadoop.yarn.server.TestDiskFailures) > Time elapsed: 10.412 sec <<< FAILURE! > java.lang.AssertionError: NodeManager could not identify disk failure. > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:186) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99) > testLogDirsFailures(org.apache.hadoop.yarn.server.TestDiskFailures) > Time elapsed: 5.99 sec <<< FAILURE! > java.lang.AssertionError: NodeManager could not identify disk failure. > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:186) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testLogDirsFailures(TestDiskFailures.java:111) > {code} > So extend the sleep time from 1000ms to 1500ms to avoid some unit test > errors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5674) FairScheduler handles "dots" in user names inconsistently in the config
[ https://issues.apache.org/jira/browse/YARN-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5674: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > FairScheduler handles "dots" in user names inconsistently in the config > --- > > Key: YARN-5674 > URL: https://issues.apache.org/jira/browse/YARN-5674 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > A user name can contain a dot because it could be used as the queue name we > replace the dot with a defined separator. When defining queues in the > configuration for users containing a dot we expect that the dot is replaced > by the "\_dot\_" string. > In the user limits we do not do that and user limits need a normal dot in the > user name. This is confusing when you create a scheduler configuration in > some places you need to replace in others you do not. This can cause issue > when user limits are not enforced as expected. > We should use one way to specify the user and since the queue naming can not > be changed we should also use the same "\_dot\_" in the user limits and > enforce correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10138) Document the new JHS API
[ https://issues.apache.org/jira/browse/YARN-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10138: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Document the new JHS API > > > Key: YARN-10138 > URL: https://issues.apache.org/jira/browse/YARN-10138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > A new API has been introduced in YARN-10028, but we did not document it in > the JHS API documentation. Let's add it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10106) Yarn logs CLI filtering by application attempt
[ https://issues.apache.org/jira/browse/YARN-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10106: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Yarn logs CLI filtering by application attempt > -- > > Key: YARN-10106 > URL: https://issues.apache.org/jira/browse/YARN-10106 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Trivial > > {{ContainerLogsRequest}} got a new parameter in YARN-10101, which is the > {{applicationAttempt}} - we can use this new parameter in Yarn logs CLI as > well to filter by application attempt. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-867: -- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Isolation of failures in aux services > -- > > Key: YARN-867 > URL: https://issues.apache.org/jira/browse/YARN-867 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hitesh Shah >Assignee: Xuan Gong >Priority: Major > Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, > YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, > YARN-867.sampleCode.2.patch > > > Today, a malicious application can bring down the NM by sending bad data to a > service. For example, sending data to the ShuffleService such that it results > any non-IOException will cause the NM's async dispatcher to exit as the > service's INIT APP event is not handled properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4495) add a way to tell AM container increase/decrease request is invalid
[ https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4495: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > add a way to tell AM container increase/decrease request is invalid > --- > > Key: YARN-4495 > URL: https://issues.apache.org/jira/browse/YARN-4495 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, client >Reporter: sandflee >Priority: Major > Labels: oct16-hard > Attachments: YARN-4495.01.patch > > > now RM may pass InvalidResourceRequestException to AM or just ignore the > change request, the former will cause AMRMClientAsync down. and the latter > will leave AM waiting for the relay. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4485) [Umbrella] Capture per-application and per-queue container allocation latency
[ https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4485: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > [Umbrella] Capture per-application and per-queue container allocation latency > - > > Key: YARN-4485 > URL: https://issues.apache.org/jira/browse/YARN-4485 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Priority: Major > Labels: supportability, tuning > > Per-application and per-queue container allocation latencies would go a long > way towards help with tuning scheduler queue configs. > This umbrella JIRA tracks adding these metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6382) Address race condition on TimelineWriter.flush() caused by buffer-sized flush
[ https://issues.apache.org/jira/browse/YARN-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6382: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Address race condition on TimelineWriter.flush() caused by buffer-sized flush > - > > Key: YARN-6382 > URL: https://issues.apache.org/jira/browse/YARN-6382 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Yousef Abu-Salah >Priority: Major > > YARN-6376 fixes the race condition between putEntities() and periodical > flush() by WriterFlushThread in TimelineCollectorManager, or between > putEntities() in different threads. > However, BufferedMutator can have internal size-based flush as well. We need > to address the resulting race condition. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6488) Remove continuous scheduling tests
[ https://issues.apache.org/jira/browse/YARN-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6488: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Remove continuous scheduling tests > -- > > Key: YARN-6488 > URL: https://issues.apache.org/jira/browse/YARN-6488 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove all continuous scheduling tests from the code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8149: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Major > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6147) Blacklisting nodes not happening for AM containers
[ https://issues.apache.org/jira/browse/YARN-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6147: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Blacklisting nodes not happening for AM containers > -- > > Key: YARN-6147 > URL: https://issues.apache.org/jira/browse/YARN-6147 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Major > > Black Listing of nodes are not happening in the following scenarios > 1. RMAppattempt is in ALLOCATED and LAUNCH_FAILED event comes when NM is down. > 2. RMAppattempt is in LAUNCHED and EXPIRE event comes when NM is down. > In both these cases AppAttempt goes to *FINAL_SAVING* and eventually to > *FINAL* state before *CONTAINER_FINISHED* event is triggered by > {{RMContainerImpl}} and in the {{FINAL}} state {{CONTAINER_FINISHED}} event > is ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8940) [CSI] Add volume as a top-level attribute in service spec
[ https://issues.apache.org/jira/browse/YARN-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8940: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > [CSI] Add volume as a top-level attribute in service spec > -- > > Key: YARN-8940 > URL: https://issues.apache.org/jira/browse/YARN-8940 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: CSI > > Initial thought: > {noformat} > { > "name": "volume example", > "version": "1.0.0", > "description": "a volume simple example", > "components" : > [ > { > "name": "", > "number_of_containers": 1, > "artifact": { > "id": "docker.io/centos:latest", > "type": "DOCKER" > }, > "launch_command": "sleep,120", > "configuration": { > "env": { > "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true" > } > }, > "resource": { > "cpus": 1, > "memory": "256", > }, > "volumes": [ > { > "volume" : { > "type": "s3_csi", > "id": "5504d4a8-b246-11e8-94c2-026b17aa1190", > "capability" : { > "min": "5Gi", > "max": "100Gi" > }, > "source_path": "s3://my_bucket/my", # optional for object stores > "mount_path": "/mnt/data", # required, the mount point in > docker container > "access_mode": "SINGLE_READ", # how the volume can be accessed > } > } > ] > } > } > ] > } > {noformat} > Open for discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8256) Pluggable provider for node membership management
[ https://issues.apache.org/jira/browse/YARN-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8256: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Pluggable provider for node membership management > - > > Key: YARN-8256 > URL: https://issues.apache.org/jira/browse/YARN-8256 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.8.3, 3.0.2 >Reporter: Dagang Wei >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > h1. Background > HDFS-7541 introduced a pluggable provider framework for node membership > management, which gives HDFS the flexibility to have different ways to manage > node membership for different needs. > [org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java] > is the class which provides the abstraction. Currently, there are 2 > implementations in the HDFS codebase: > 1) > [org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java] > which uses 2 config files which are defined by the properties dfs.hosts and > dfs.hosts.exclude. > 2) > [org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java] > which uses a single JSON file defined by the property dfs.hosts. > dfs.namenode.hosts.provider.classname is the property determining which > implementation is used > h1. Problem > YARN should be consistent with HDFS in terms of pluggable provider for node > membership management. The absence of it makes YARN impossible to have other > config sources, e.g., ZooKeeper, database, other config file formats, etc. > h1. Proposed solution > [org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java] > is the class for managing YARN node membership today. It uses > [HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java] > to read config files specified by the property > yarn.resourcemanager.nodes.include-path for nodes to include and > yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude. > The proposed solution is to > 1) introduce a new interface {color:#008000}HostsConfigManager{color} which > provides the abstraction for node membership management. Update > {color:#008000}NodeListManager{color} to depend on > {color:#008000}HostsConfigManager{color} instead of > {color:#008000}HostsFileReader{color}. Then create a wrapper class for > {color:#008000}HostsFileReader{color} which implements the interface. > 2) introduce a new config property > {color:#008000}yarn.resourcemanager.hosts-config.manager.class{color} for > specifying the implementation class. Set the default value to the wrapper > class of {color:#008000}HostsFileReader{color} for backward compatibility > between new code and old config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8366) Expose debug log information when user intend to enable GPU without setting nvidia-smi path
[ https://issues.apache.org/jira/browse/YARN-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8366: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Expose debug log information when user intend to enable GPU without setting > nvidia-smi path > --- > > Key: YARN-8366 > URL: https://issues.apache.org/jira/browse/YARN-8366 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > > Expose Debug information help user found the root cause of failure when user > don't make these two settings manually before enabling GPU on YARN > 1. yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables in > yarn-site.xml > 2. environment variable LD_LIBRARY_PATH -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5814: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu >Priority: Major > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2836) RM behaviour on token renewal failures is broken
[ https://issues.apache.org/jira/browse/YARN-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-2836: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > RM behaviour on token renewal failures is broken > > > Key: YARN-2836 > URL: https://issues.apache.org/jira/browse/YARN-2836 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Major > > Found this while reviewing YARN-2834. > We now completely ignore token renewal failures. For things like Timeline > tokens which are automatically obtained whether the app needs it or not (we > should fix this to be user driven), we can ignore failures. But for HDFS > Tokens etc, ignoring failures is bad because it (1) wastes resources as AMs > will continue and eventually fail (2) app doesn't know what happened it fails > eventually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9415) Document FS placement rule changes from YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-9415: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Document FS placement rule changes from YARN-8967 > - > > Key: YARN-9415 > URL: https://issues.apache.org/jira/browse/YARN-9415 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, fairscheduler >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > With the changes introduced by YARN-8967 we now allow parent rules on all > existing rules. This should be documented. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4758) Enable discovery of AMs by containers
[ https://issues.apache.org/jira/browse/YARN-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4758: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Enable discovery of AMs by containers > - > > Key: YARN-4758 > URL: https://issues.apache.org/jira/browse/YARN-4758 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Junping Du >Priority: Major > Attachments: YARN-4758. AM Discovery Service for YARN Container.pdf > > > {color:red} > This is already discussed on the umbrella JIRA YARN-1489. > Copying some of my condensed summary from the design doc (section 3.2.10.3) > of YARN-4692. > {color} > Even after the existing work in Workpreserving AM restart (Section 3.1.2 / > YARN-1489), we still haven’t solved the problem of old running containers not > knowing where the new AM starts running after the previous AM crashes. This > is a specifically important problem to be solved for long running services > where we’d like to avoid killing service containers when AMs failover. So > far, we left this as a task for the apps, but solving it in YARN is much > desirable. [(Task) This looks very much like service-registry (YARN-913), > but for appcontainers to discover their own AMs. > Combining this requirement (of any container being able to find their AM > across failovers) with those of services (to be able to find through DNS > where a service container is running - YARN-4757) will put our registry > scalability needs to be much higher than that of just service endpoints. > This calls for a more distributed solution for registry readers something > that is discussed in the comments section of YARN-1489 and MAPREDUCE-6608. > See comment > https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9741) [JDK11] TestAHSWebServices.testAbout fails
[ https://issues.apache.org/jira/browse/YARN-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-9741: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > [JDK11] TestAHSWebServices.testAbout fails > -- > > Key: YARN-9741 > URL: https://issues.apache.org/jira/browse/YARN-9741 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Affects Versions: 3.2.0 >Reporter: Adam Antal >Priority: Major > > On openjdk-11.0.2 TestAHSWebServices.testAbout[0] fails consistently with the > following stack trace: > {noformat} > [ERROR] Tests run: 40, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 7.9 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices > [ERROR] > testAbout[0](org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices) > Time elapsed: 0.241 s <<< FAILURE! > org.junit.ComparisonFailure: expected: but > was: > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices.testAbout(TestAHSWebServices.java:333) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5902) yarn.scheduler.increment-allocation-mb and yarn.scheduler.increment-allocation-vcores are undocumented in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5902: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > yarn.scheduler.increment-allocation-mb and > yarn.scheduler.increment-allocation-vcores are undocumented in > yarn-default.xml > -- > > Key: YARN-5902 > URL: https://issues.apache.org/jira/browse/YARN-5902 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: YARN-5902.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4804) [Umbrella] Improve test run duration
[ https://issues.apache.org/jira/browse/YARN-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4804: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > [Umbrella] Improve test run duration > > > Key: YARN-4804 > URL: https://issues.apache.org/jira/browse/YARN-4804 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Priority: Major > > Our tests take a long time to run. e.g. the RM tests take 67 minutes. Given > our precommit builds run our tests against two Java versions, this issue is > exacerbated. > Filing this umbrella JIRA to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6723) NM overallocation based on over-time rather than snapshot utilization
[ https://issues.apache.org/jira/browse/YARN-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6723: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > NM overallocation based on over-time rather than snapshot utilization > - > > Key: YARN-6723 > URL: https://issues.apache.org/jira/browse/YARN-6723 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > > To continue discussion on Miklos's idea in YARN-6670 of > "Usually the CPU usage fluctuates quite a bit. Do not we need a time period > for NM_OVERALLOCATION_GENERAL_THRESHOLD, etc. to avoid allocating on small > glitches, even worse preempting in those cases?" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9852) Allow multiple MiniYarnCluster to be used
[ https://issues.apache.org/jira/browse/YARN-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-9852: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Allow multiple MiniYarnCluster to be used > - > > Key: YARN-9852 > URL: https://issues.apache.org/jira/browse/YARN-9852 > Project: Hadoop YARN > Issue Type: New Feature > Components: test >Affects Versions: 3.2.1 >Reporter: Adam Antal >Priority: Major > > During implementing new HBase replication tests we observed that there are > problems in the communication between multiple MiniYarnCluster in one test > suite. I haven't seen any testcase in the Hadoop repository that uses > multiple clusters in one test, but seems like a logical request to allow > this. > In case this jira does not involve any code change (it's just mainly a > configuration issue), then I suggest to add a testcase that would demonstrate > such a suitable configuration. > Thanks for the consultation to [~bszabolcs] about this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8074) Support placement policy composite constraints in YARN Service
[ https://issues.apache.org/jira/browse/YARN-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8074: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Support placement policy composite constraints in YARN Service > -- > > Key: YARN-8074 > URL: https://issues.apache.org/jira/browse/YARN-8074 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > > This is a follow up of YARN-7142 where we support more advanced placement > policy features like creating composite constraints by exposing expressions > in YARN Service specification. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6653) Retrieve CPU and MEMORY metrics for applications in a flow run
[ https://issues.apache.org/jira/browse/YARN-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6653: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Retrieve CPU and MEMORY metrics for applications in a flow run > -- > > Key: YARN-6653 > URL: https://issues.apache.org/jira/browse/YARN-6653 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Akhil PB >Priority: Major > > Similarly to YARN-6651, > 'metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' can be added > to the web ui query fired by a user listing all applications in a flow run. > CPU and MEMORY can be retrieved this way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7882) Server side proxy for UI2 log viewer
[ https://issues.apache.org/jira/browse/YARN-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7882: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Server side proxy for UI2 log viewer > > > Key: YARN-7882 > URL: https://issues.apache.org/jira/browse/YARN-7882 > Project: Hadoop YARN > Issue Type: Bug > Components: security, timelineserver, yarn-ui-v2 >Affects Versions: 3.0.0 >Reporter: Eric Yang >Priority: Major > > When viewing container logs in UI2, the log files are directly fetched > through timeline server 2. Hadoop in simple security mode does not have > authenticator to make sure the user is authorized to view the log. The > general practice is to use knox or other security proxy to authenticate the > user and reserve proxy the request to Hadoop UI to ensure the information > does not leak through anonymous user. The current implementation of UI2 log > viewer uses ajax code to timeline server 2. This could prevent knox or > reverse proxy software from working properly with the new design. It would > be good to perform server side proxy to prevent browser from side step the > authentication check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6606) The implementation of LocalizationStatus in ContainerStatusProto
[ https://issues.apache.org/jira/browse/YARN-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6606: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > The implementation of LocalizationStatus in ContainerStatusProto > > > Key: YARN-6606 > URL: https://issues.apache.org/jira/browse/YARN-6606 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Bingxue Qiu >Priority: Major > Attachments: YARN-6606.1.patch, YARN-6606.2.patch > > > we have a use case, where the full implementation of localization status in > ContainerStatusProto > [Continuous-resource-localization|https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf] >need to be done , so we make it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7134) AppSchedulingInfo has a dependency on capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7134: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > AppSchedulingInfo has a dependency on capacity scheduler > > > Key: YARN-7134 > URL: https://issues.apache.org/jira/browse/YARN-7134 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Sunil G >Priority: Major > > The common scheduling code should be independent of all scheduler > implementations. YARN-6040 introduced capacity scheduler's > {{SchedulingMode}} into {{AppSchedulingInfo}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5205) yarn logs for live applications does not provide log files which may have already been aggregated
[ https://issues.apache.org/jira/browse/YARN-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5205: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > yarn logs for live applications does not provide log files which may have > already been aggregated > - > > Key: YARN-5205 > URL: https://issues.apache.org/jira/browse/YARN-5205 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth >Priority: Major > > With periodic aggregation enabled, the logs which have been partially > aggregated are not always displayed by the yarn logs command. > If the file exists in the log dir for a container - all previously aggregated > files with the same name, along with the current file will be part of the > yarn log output. > Files which have been previously aggregated, for which a file with the same > name does not exists in the container log dir do not show up in the output. > After the app completes, all logs are available. > cc [~xgong] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9936) Support vector of capacity percentages in Capacity Scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-9936: --- Target Version/s: 3.4.0 (was: 3.3.0, 3.2.2) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Support vector of capacity percentages in Capacity Scheduler configuration > -- > > Key: YARN-9936 > URL: https://issues.apache.org/jira/browse/YARN-9936 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Zoltan Siegl >Assignee: Zoltan Siegl >Priority: Major > Attachments: Capacity Scheduler support of “vector of resources > percentage”.pdf > > > Currently, the Capacity Scheduler queue configuration supports two ways to > set queue capacity. > * In percentage of all available resources as a float ( eg. 25.0 ) means 25% > of the resources of its parent queue for all resource types equally (eg. 25% > of all memory, 25% of all CPU cores, and 25% of all available GPU in the > cluster) The percentages of all queues has to add up to 100%. > * In an absolute amount of resources ( e.g. > memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the > queues has to be less than or equal to all resources in the cluster. > Apart from these two already existing ways, there is a demand to add capacity > percentage of each available resource type separately. (eg. > {{memory=20%,vcores=40%,yarn.io/gpu=100%}}). > At the same time, a similar concept should be included with queues > maximum-capacity as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7884) Race condition in registering YARN service in ZooKeeper
[ https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7884: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Race condition in registering YARN service in ZooKeeper > --- > > Key: YARN-7884 > URL: https://issues.apache.org/jira/browse/YARN-7884 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Priority: Major > > In Kerberos enabled cluster, there seems to be a race condition for > registering YARN service. > Yarn-service znode creation seems to happen after AM started and reporting > back to update components information. For some reason, Yarnservice znode > should have access to create the znode, but reported NoAuth. > {code} > 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry > user accounts: sasl:hbase > 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default > system acls: > [1,s{'world,'anyone} > , 31,s{'sasl,'yarn} > , 31,s{'sasl,'jhs} > , 31,s{'sasl,'hdfs-demo} > , 31,s{'sasl,'rm} > , 31,s{'sasl,'hive} > ] > 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs > [31,s{'sasl,'hbase} > , 31,s{'sasl,'hbase} > ] > 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering > class org.apache.hadoop.yarn.service.component.ComponentEventType for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler > 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering > class > org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType > for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler > 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of > the thread pool size is 500 > 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service > as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) > 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: > class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler > 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - > Starting Socket Reader #1 for port 56859 > 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding > protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to > the server > 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server > Responder: starting > 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC > Server listener on 56859: starting > 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated > ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 > 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating > CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" > 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl > client: jaasClientEntry = Client, principal = > hbase/eyang-5.openstacklo...@example.com, keytab = > /etc/security/keytabs/hbase.service.keytab > 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to > ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 > 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering > appattempt_1517611904996_0001_01, abc into registry > 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 > containers from previous attempt. > 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not > read component paths: `/users/hbase/services/yarn-service/abc/components': No > such file or directory: KeeperErrorCode = NoNode for > /registry/users/hbase/services/yarn-service/abc/components > 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component sleeper > 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT > sleeper]: 2 instances. > 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT > sleeper] Transitioned from INIT to FLEXING on FLEX event. > 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - > Failed to register app abc in registry > org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: > `/registry/users/hbase/services/yarn-service/abc': Not authorized to access > path; ACLs: [ > 0x01: 'world,'anyone > 0x1f: 'sasl,'yarn > 0x1f: 'sasl,'jhs > 0x1f: 'sasl,'hdfs-demo > 0x1f: 'sasl,'rm > 0x1f: 'sasl,'hive > 0x1f:
[jira] [Updated] (YARN-7342) Application page doesn't show correct metrics for reservation runs
[ https://issues.apache.org/jira/browse/YARN-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7342: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Application page doesn't show correct metrics for reservation runs > --- > > Key: YARN-7342 > URL: https://issues.apache.org/jira/browse/YARN-7342 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, reservation system >Affects Versions: 3.1.0 >Reporter: Yufei Gu >Priority: Major > Attachments: Screen Shot 2017-10-16 at 17.27.48.png > > > As the screen shot shows, there are some bugs on the webUI while running job > with reservation. For examples, queue name should just be "root.queueA" > instead of internal queue name. All metrics(Allocated CPU, % of queue, etc) > are missing for reservation runs. These should be a blocker though. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6963) Prevent other containers from staring when a container is re-initializing
[ https://issues.apache.org/jira/browse/YARN-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6963: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Prevent other containers from staring when a container is re-initializing > - > > Key: YARN-6963 > URL: https://issues.apache.org/jira/browse/YARN-6963 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Major > > Further to discussions in YARN-6920. > Container re-initialization will lead to momentary relinquishing of NM > resources when the container is brought down followed by re-claiming of the > same resources when it is re-launched. If there are Opportunistic containers > in the queue, it can lead to un-necessary churn if one of those opportunistic > containers are started and immediately killed. > This JIRA tracks changes required to prevent the above by ensuring the > resources for a container are 'locked' for the during of the container > lifetime - including the time it takes for a re-initialization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7912) While launching Native Service app from UI, consider service owner name from user.name query parameter
[ https://issues.apache.org/jira/browse/YARN-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7912: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > While launching Native Service app from UI, consider service owner name from > user.name query parameter > -- > > Key: YARN-7912 > URL: https://issues.apache.org/jira/browse/YARN-7912 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Sunil G >Priority: Major > > As per comments from [~eyang] in YARN-7827, > "For supporting knox, it would be good for javascript to detect the url > entering /ui2 and process [user.name|http://user.name/] property. If there > isn't one found, then proceed with ajax call to resource manager to find out > who is the current user to pass the parameter along the rest api calls." > This Jira will track to handle this. This is now pending feasibility check. > Thanks [~eyang] and [~jianhe] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7086) Release all containers aynchronously
[ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7086: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Release all containers aynchronously > > > Key: YARN-7086 > URL: https://issues.apache.org/jira/browse/YARN-7086 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Arun Suresh >Assignee: Manikandan R >Priority: Major > Attachments: YARN-7086.001.patch, YARN-7086.002.patch, > YARN-7086.Perf-test-case.patch > > > We have noticed in production two situations that can cause deadlocks and > cause scheduling of new containers to come to a halt, especially with regard > to applications that have a lot of live containers: > # When these applicaitons release these containers in bulk. > # When these applications terminate abruptly due to some failure, the > scheduler releases all its live containers in a loop. > To handle the issues mentioned above, we have a patch in production to make > sure ALL container releases happen asynchronously - and it has served us well. > Opening this JIRA to gather feedback on if this is a good idea generally (cc > [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd]) > BTW, In YARN-6251, we already have an asyncReleaseContainer() in the > AbstractYarnScheduler and a corresponding scheduler event, which is currently > used specifically for the container-update code paths (where the scheduler > realeases temp containers which it creates for the update) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7263) Check host name resolution performance when resource manager starts up
[ https://issues.apache.org/jira/browse/YARN-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7263: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Check host name resolution performance when resource manager starts up > -- > > Key: YARN-7263 > URL: https://issues.apache.org/jira/browse/YARN-7263 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Yufei Gu >Priority: Major > > According to YARN-7207, host name resolution could be slow in some > environment, which affects RM performance in different ways. It would be nice > to check that when RM starts up and place a warning message into the logs if > the performance is not ideal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9652) Convert SchedulerQueueManager from a protocol-only type to a basic hierarchical queue implementation
[ https://issues.apache.org/jira/browse/YARN-9652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-9652: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Convert SchedulerQueueManager from a protocol-only type to a basic > hierarchical queue implementation > > > Key: YARN-9652 > URL: https://issues.apache.org/jira/browse/YARN-9652 > Project: Hadoop YARN > Issue Type: Improvement > Components: reservation system, scheduler >Affects Versions: 3.3.0 >Reporter: Erkin Alp Güney >Priority: Major > > SchedulerQueueManager is currently an interface aka a protocol-only type. As > seen in the codebase, each scheduler implements the queue configuration and > management stuff over and over. If we convert it into a base concrete class > with simple implementation of hierarchical queue system (as in Fair and > Capacity schedulers), pluggable schedulers may be developed more easily. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6466) Provide shaded framework jar for containers
[ https://issues.apache.org/jira/browse/YARN-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6466: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Provide shaded framework jar for containers > --- > > Key: YARN-6466 > URL: https://issues.apache.org/jira/browse/YARN-6466 > Project: Hadoop YARN > Issue Type: New Feature > Components: build, yarn >Affects Versions: 3.0.0-alpha1 >Reporter: Sean Busbey >Assignee: Haibo Chen >Priority: Major > > We should build on the existing shading work to provide a jar with all of the > bits needed within a YARN application's container to talk to the resource > manager and node manager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8733) Readiness check for remote component
[ https://issues.apache.org/jira/browse/YARN-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8733: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Readiness check for remote component > > > Key: YARN-8733 > URL: https://issues.apache.org/jira/browse/YARN-8733 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Major > > When a service is deploying, there can be remote component dependency between > services. For example, Hive server 2 can depend on Hive metastore, which > depends on a remote MySQL database. It would be great to have ability to > check the remote server and port to make sure MySQL is available before > deploying Hive LLAP service. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code
[ https://issues.apache.org/jira/browse/YARN-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8779: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Fix few discrepancies between YARN Service swagger spec and code > > > Key: YARN-8779 > URL: https://issues.apache.org/jira/browse/YARN-8779 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0, 3.1.1 >Reporter: Gour Saha >Priority: Major > > Following issues were identified in YARN Service swagger definition during an > effort to integrate with a running service by generating Java and Go > client-side stubs from the spec - > > 1. > *restartPolicy* is wrong and should be *restart_policy* > > 2. > A DELETE request to a non-existing service (or a previously existing but > deleted service) throws an ApiException instead of something like > NotFoundException (the equivalent of 404). Note, DELETE of an existing > service behaves fine. > > 3. > The response code of DELETE request is 200. The spec says 204. Since the > response has a payload, the spec should be updated to 200 instead of 204. > > 4. > _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method > does not return a Service object. Swagger definition has the below bug in GET > response of */app/v1/services/\{service_name}* - > {code:java} > type: object > items: > $ref: '#/definitions/Service' > {code} > It should be - > {code:java} > $ref: '#/definitions/Service' > {code} > > 5. > Serialization issues were seen in all enum classes - ServiceState.java, > ContainerState.java, ComponentState.java, PlacementType.java and > PlacementScope.java. > Java client threw the below exception for ServiceState - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Cannot construct instance of > `org.apache.cb.yarn.service.api.records.ServiceState` (although at least one > Creator exists): no String-argument constructor/factory method to deserialize > from String value ('ACCEPTED') > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 121] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["state”]) > {code} > For Golang we saw this for ContainerState - > {code:java} > ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot > unmarshal string into Go struct field Container.state of type > yarnmodel.ContainerState > {code} > > 6. > *launch_time* actually returns an integer but swagger definition says date. > Hence, the following exception is seen on the client side - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or > string. > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 477] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”]) > {code} > > 8. > *user.name* query param with a valid value is required for all API calls to > an unsecure cluster. This is not defined in the spec. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9675) Expose log aggregation diagnostic messages through RM API
[ https://issues.apache.org/jira/browse/YARN-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-9675: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Expose log aggregation diagnostic messages through RM API > - > > Key: YARN-9675 > URL: https://issues.apache.org/jira/browse/YARN-9675 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, log-aggregation, resourcemanager >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > The ResourceManager collects the log aggregation status reports from the > NodeManagers. Currently these reports are collected, but when app info API or > similar high-level REST is called, only an overall status is displayed > (RUNNING, RUNNING_WITH_FAILURES,FAILED etc.). > The diagnostic messages are only available through the old RM web UI, so our > internal tool currently crawls that page and extract the log aggregation > diagnostic and error messages from the raw HTML. This is not a good practice, > and more elegant API call may be preferable. It may be useful for others as > well since log aggregation related failures are usually hard to debug since > the lack of trace/debug messages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5852) Consolidate CSAssignment, ContainerAllocation, ContainerAllocationContext class in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5852: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Consolidate CSAssignment, ContainerAllocation, ContainerAllocationContext > class in CapacityScheduler > > > Key: YARN-5852 > URL: https://issues.apache.org/jira/browse/YARN-5852 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Priority: Major > > Quite a few data structures which wraps container related info with similar > names: CSAssignment, ContainerAllocation, ContainerAllocationContext, And a > bunch of code to convert one from another. we should consolidate those to be > a single one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5194) Avoid adding yarn-site to all Configuration instances created by the JVM
[ https://issues.apache.org/jira/browse/YARN-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5194: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Avoid adding yarn-site to all Configuration instances created by the JVM > > > Key: YARN-5194 > URL: https://issues.apache.org/jira/browse/YARN-5194 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Seth >Priority: Major > > {code} > static { > addDeprecatedKeys(); > Configuration.addDefaultResource(YARN_DEFAULT_CONFIGURATION_FILE); > Configuration.addDefaultResource(YARN_SITE_CONFIGURATION_FILE); > } > {code} > This puts the contents of yarn-default and yarn-site into every configuration > instance created in the VM after YarnConfiguration has been initialized. > This should be changed to a local addResource for the specific > YarnConfiguration instance, instead of polluting every Configuration instance. > Incompatible change. Have set the target version to 3.x. > The same applies to HdfsConfiguration (hdfs-site.xml), and Configuration > (core-site.xml etc). > core-site may be worth including everywhere, however it would be better to > expect users to explicitly add the relevant resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8161) ServiceState FLEX should be removed
[ https://issues.apache.org/jira/browse/YARN-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8161: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > ServiceState FLEX should be removed > --- > > Key: YARN-8161 > URL: https://issues.apache.org/jira/browse/YARN-8161 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Gour Saha >Priority: Major > > ServiceState FLEX is not required to trigger flex up/down of containers and > should be removed -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6651) Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to retrieve CPU and memory
[ https://issues.apache.org/jira/browse/YARN-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6651: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to > retrieve CPU and memory > -- > > Key: YARN-6651 > URL: https://issues.apache.org/jira/browse/YARN-6651 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Akhil PB >Priority: Major > > When you click on Flow Acitivity => \{a flow\} => flow runs, the web server > sends a REST query to ATSv2 TimelineReaderServer, but it does not include a > query param 'metricstoretrieve" to get any metrics back. > Instead, we should add > '?metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' to the > query to get CPU and MEMORY back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6690) Consolidate NM overallocation thresholds with ResourceTypes
[ https://issues.apache.org/jira/browse/YARN-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6690: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Consolidate NM overallocation thresholds with ResourceTypes > > > Key: YARN-6690 > URL: https://issues.apache.org/jira/browse/YARN-6690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > > YARN-3926 (ResourceTypes) introduces a new class ResourceInformation to > encapsulate all information about a given resource type (e.g. type, value, > unit). We could add the overallocation thresholds to it as well. > Another thing to look at, as suggested by Wangda in YARN-4511 is whether we > could just use ResourceThresholds to replace OverallocationInfo. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7182) YARN's StateMachine should be stable
[ https://issues.apache.org/jira/browse/YARN-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7182: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > YARN's StateMachine should be stable > > > Key: YARN-7182 > URL: https://issues.apache.org/jira/browse/YARN-7182 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Priority: Major > > It's currently {{Evolving}}, which is clearly no longer true. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10059) Final states of failed-to-localize containers are not recorded in NM state store
[ https://issues.apache.org/jira/browse/YARN-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10059: Target Version/s: 3.4.0 (was: 3.3.0, 3.2.2, 3.1.4, 2.10.1) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Final states of failed-to-localize containers are not recorded in NM state > store > > > Key: YARN-10059 > URL: https://issues.apache.org/jira/browse/YARN-10059 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-10059.001.patch > > > Currently we found an issue that many localizers of completed containers were > launched and exhausted memory/cpu of that machine after NM restarted, these > containers were all failed and completed when localizing on a non-existed > local directory which is caused by another problem, but their final states > weren't recorded in NM state store. > The process flow of a fail-to-localize container is as follow: > {noformat} > ResourceLocalizationService$LocalizerRunner#run > -> ContainerImpl$ResourceFailedTransition#transition handle LOCALIZING -> > LOCALIZATION_FAILED upon RESOURCE_FAILED > dispatch LocalizationEventType.CLEANUP_CONTAINER_RESOURCES > -> ResourceLocalizationService#handleCleanupContainerResources handle > CLEANUP_CONTAINER_RESOURCES > dispatch ContainerEventType.CONTAINER_RESOURCES_CLEANEDUP > -> ContainerImpl$LocalizationFailedToDoneTransition#transition > handle LOCALIZATION_FAILED -> DONE upon CONTAINER_RESOURCES_CLEANEDUP > {noformat} > There's no update for state store in this flow now, which is required to > avoid unnecessary localizations after NM restarts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6429) Revisit implementation of LocalitySchedulingPlacementSet to avoid invoke methods of AppSchedulingInfo
[ https://issues.apache.org/jira/browse/YARN-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6429: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Revisit implementation of LocalitySchedulingPlacementSet to avoid invoke > methods of AppSchedulingInfo > - > > Key: YARN-6429 > URL: https://issues.apache.org/jira/browse/YARN-6429 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > > An example is, LocalitySchedulingPlacementSet#decrementOutstanding: it calls > appSchedulingInfo directly, which could potentially cause trouble since it > tries to modify parent from child. Is it possible to move this logic to > AppSchedulingInfo#allocate. > Need to check other methods as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8583) Inconsistency in YARN status command
[ https://issues.apache.org/jira/browse/YARN-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-8583: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Inconsistency in YARN status command > > > Key: YARN-8583 > URL: https://issues.apache.org/jira/browse/YARN-8583 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Yang >Priority: Major > > YARN app -status command can report base on application ID or application > name with some usability limitation. Application ID is globally unique, and > it allows any user to query application status of any application. > Application name is not globally unique, and it will only work for querying > user's own application. This is somewhat restrictive for application > administrator, but allowing other user to query any other user's application > could consider a security hole as well. There are two possible options to > reduce the inconsistency: > Option 1. Block other user from query application status. This may improve > security in some sense, but it is an incompatible change. This is a simpler > change by matching the owner of the application, and decide to report or not > report. > Option 2. Add --user parameter to allow administrator to query application > name ran by other user. This is a bigger change because application metadata > is stored in user's own hdfs directory. There are security restriction that > need to be defined. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4713) Warning by unchecked conversion in TestTimelineWebServices
[ https://issues.apache.org/jira/browse/YARN-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-4713: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Warning by unchecked conversion in TestTimelineWebServices > --- > > Key: YARN-4713 > URL: https://issues.apache.org/jira/browse/YARN-4713 > Project: Hadoop YARN > Issue Type: Test > Components: test >Reporter: Tsuyoshi Ozawa >Priority: Major > Labels: newbie > Attachments: YARN-4713.1.patch, YARN-4713.2.patch > > > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java:[123,38] > [unchecked] unchecked conversion > {code} > Enumeration names = mock(Enumeration.class); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6652) Merge flow info and flow runs
[ https://issues.apache.org/jira/browse/YARN-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6652: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Merge flow info and flow runs > - > > Key: YARN-6652 > URL: https://issues.apache.org/jira/browse/YARN-6652 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Akhil PB >Priority: Major > > If a user clicks on a flow from the flow activity page, Flow Run and Flow > Info are shown separately. Usually, users want to go to individual flow runs. > With the current work flow, the user will need to click on Flow Run because > Flow Info is selected by default. > Given that Flow Info does not have much information, It'd be a nice > improvement if we can show flow info and flow run together, that is, one > section at the top containing flow info, another section at the bottom > containing the flow runs -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6527) Provide a better out-of-the-box experience for SLS
[ https://issues.apache.org/jira/browse/YARN-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6527: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Provide a better out-of-the-box experience for SLS > -- > > Key: YARN-6527 > URL: https://issues.apache.org/jira/browse/YARN-6527 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 3.0.0-alpha4 >Reporter: Robert Kanter >Priority: Major > > The example provided with SLS appears to be broken - I didn't see any jobs > running. On top of that, it seems like getting SLS to run properly requires > a lot of hadoop site configs, scheduler configs, etc. I was only able to get > something running after [~yufeigu] provided a lot of config files. > We should provide a better out-of-the-box experience for SLS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6088) RM UI has to redirect to AHS for completed applications logs
[ https://issues.apache.org/jira/browse/YARN-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6088: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > RM UI has to redirect to AHS for completed applications logs > > > Key: YARN-6088 > URL: https://issues.apache.org/jira/browse/YARN-6088 > Project: Hadoop YARN > Issue Type: Task > Components: webapp >Affects Versions: 2.7.3 >Reporter: Sunil G >Priority: Major > > Currently AMContainer logs link in RMAppBlock is hardcoded containers' host > node. If that node unavailable, we will not have enough information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10065) Support Placement Constraints for AM container allocations
[ https://issues.apache.org/jira/browse/YARN-10065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10065: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Support Placement Constraints for AM container allocations > -- > > Key: YARN-10065 > URL: https://issues.apache.org/jira/browse/YARN-10065 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: Daniel Velasquez >Priority: Major > > Currently ApplicationSubmissionContext API supports specifying a node label > expression for the AM resource request. It would be beneficial to have the > ability to specify Placement Constraints as well for the AM resource request. > We have a requirement to constrain AM containers on certain nodes e.g. AM > containers not on preemptible/spot cloud instances. It looks like node > attributes would fit our use case well. However, we currently don't have the > ability to specify this in the API for AM resource requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7867) Enable YARN service by default
[ https://issues.apache.org/jira/browse/YARN-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7867: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Enable YARN service by default > -- > > Key: YARN-7867 > URL: https://issues.apache.org/jira/browse/YARN-7867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Priority: Major > > YARN service REST API is disabled by default. We will make the decision to > turn on this feature by default when the code is mature enough to be consumed > by public. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7418) Improve performance of locking in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-7418: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Improve performance of locking in fair scheduler > > > Key: YARN-7418 > URL: https://issues.apache.org/jira/browse/YARN-7418 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-beta1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > > Based on initial testing, we can improve scheduler performance by 5%-10% with > some simple optimizations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9995) Code cleanup in TestSchedConfCLI
[ https://issues.apache.org/jira/browse/YARN-9995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079668#comment-17079668 ] Bilwa S T commented on YARN-9995: - Hi [~snemeth] TestSchedConfCLI#testFormatSchedulerConf fails in branch-3.2 . So i have raised Jira YARN-10230 to fix it. > Code cleanup in TestSchedConfCLI > > > Key: YARN-9995 > URL: https://issues.apache.org/jira/browse/YARN-9995 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Bilwa S T >Priority: Minor > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-9995.001.patch, YARN-9995.002.patch, > YARN-9995.003.patch, YARN-9995.004.patch, YARN-9995.branch-3.2.patch > > > Some tests are too verbose: > - add / delete / remove queues testcases: Creating SchedConfUpdateInfo > instances could be simplified with a helper method or something like that. > - Some fields can be converted to local variables: sysOutStream, sysOut, > sysErr, csConf > - Any additional cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6831) Miscellaneous refactoring changes of ContainScheduler
[ https://issues.apache.org/jira/browse/YARN-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6831: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Miscellaneous refactoring changes of ContainScheduler > -- > > Key: YARN-6831 > URL: https://issues.apache.org/jira/browse/YARN-6831 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > > While reviewing YARN-6706, Karthik pointed out a few issues for improvment in > ContainerScheduler > *Make ResourceUtilizationTracker pluggable. That way, we could use a > different tracker when oversubscription is enabled. > *ContainerScheduler > ##Why do we need maxOppQueueLength given queuingLimit? > ##Is there value in splitting runningContainers into runningGuaranteed and > runningOpportunistic? > ##getOpportunisticContainersStatus method implementation feels awkward. How > about capturing the state in the field here, and have metrics etc. pull from > here? > ##startContainersFromQueue: Local variable resourcesAvailable is unnecessary > *OpportunisticContainersStatus > ##Let us clearly differentiate between allocated, used and utilized. Maybe, > we should rename current Used methods to Allocated? > ##I prefer either full name Opportunistic (in method) or Opp (shortest name > that makes sense). Opport is neither short nor fully descriptive. > ##Have we considered folding ContainerQueuingLimit class into this? > We decided to move the issues into this follow up jira to keep YARN-6706 > moving forward to unblock oversubscription work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10230) TestSchedConfCLI#testFormatSchedulerConf fails
Bilwa S T created YARN-10230: Summary: TestSchedConfCLI#testFormatSchedulerConf fails Key: YARN-10230 URL: https://issues.apache.org/jira/browse/YARN-10230 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.2.0 Reporter: Bilwa S T Assignee: Bilwa S T {code:java} [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.979 s <<< FAILURE! - in org.apache.hadoop.yarn.client.cli.TestSchedConfCLI [ERROR] testFormatSchedulerConf(org.apache.hadoop.yarn.client.cli.TestSchedConfCLI) Time elapsed: 10.017 s <<< ERROR! java.lang.Exception: test timed out after 1 milliseconds at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) at com.sun.jersey.api.client.Client.handle(Client.java:652) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509) at org.apache.hadoop.yarn.client.cli.SchedConfCLI.formatSchedulerConf(SchedConfCLI.java:191) at org.apache.hadoop.yarn.client.cli.TestSchedConfCLI.testFormatSchedulerConf(TestSchedConfCLI.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10212) Create separate configuration for max global AM attempts
[ https://issues.apache.org/jira/browse/YARN-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079664#comment-17079664 ] Hudson commented on YARN-10212: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18135 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18135/]) YARN-10212. Create separate configuration for max global AM attempts. (jhung: rev 23481ad378de7f8e95eabefbd102825f757714b8) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java > Create separate configuration for max global AM attempts > > > Key: YARN-10212 > URL: https://issues.apache.org/jira/browse/YARN-10212 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Bilwa S T >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1, 3.4.0 > > Attachments: YARN-10212.001.patch, YARN-10212.002.patch, > YARN-10212.003.patch, YARN-10212.004.patch > > > Right now user's default max AM attempts is set to the same as global max AM > attempts: > {noformat} > int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, > YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS); {noformat} > If we want to increase global max AM attempts, it will also increase the > default. So we should create a separate global AM max attempts config to > separate the two. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10227) Pull YARN-8242 back to branch-2.10
[ https://issues.apache.org/jira/browse/YARN-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079656#comment-17079656 ] Jonathan Hung commented on YARN-10227: -- Thanks Jim for fixing this. Belated +1 from me. > Pull YARN-8242 back to branch-2.10 > -- > > Key: YARN-10227 > URL: https://issues.apache.org/jira/browse/YARN-10227 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.10.0, 2.10.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Fix For: 2.10.1 > > Attachments: YARN-10227-branch-2.10.001.patch > > > We have recently seen the nodemanager OOM issue reported in YARN-8242 during > a rolling upgrade. Our code is currently based on branch-2.8, but we are in > the process of moving to 2.10. I checked and YARN-8242 pulls back to > branch-2.10 pretty cleanly. The only conflict was a minor one in > TestNMLeveldbStateStoreService.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9995) Code cleanup in TestSchedConfCLI
[ https://issues.apache.org/jira/browse/YARN-9995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079639#comment-17079639 ] Hadoop QA commented on YARN-9995: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 49s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}120m 54s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}176m 53s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.cli.TestSchedConfCLI | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:11aff6c269f | | JIRA Issue | YARN-9995 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12999458/YARN-9995.branch-3.2.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1065f27aeb23 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.2 / 4c63a81 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/25840/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25840/testReport/ | | Max. process+thread count | 547 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client | | Console output |
[jira] [Updated] (YARN-6838) Add support to LinuxContainerExecutor to support container PAUSE
[ https://issues.apache.org/jira/browse/YARN-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6838: --- Target Version/s: 3.4.0 (was: 3.3.0) > Add support to LinuxContainerExecutor to support container PAUSE > > > Key: YARN-6838 > URL: https://issues.apache.org/jira/browse/YARN-6838 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Major > > This JIRA tracks the changes needed to the {{LinuxContainerExecutor}}, > {{LinuxContainerRuntime}}, {{DockerLinuxContainerRuntime}} and the > {{container-executor}} linux binary to support container PAUSE using cgroups > freezer module -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6838) Add support to LinuxContainerExecutor to support container PAUSE
[ https://issues.apache.org/jira/browse/YARN-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079638#comment-17079638 ] Brahma Reddy Battula commented on YARN-6838: Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Add support to LinuxContainerExecutor to support container PAUSE > > > Key: YARN-6838 > URL: https://issues.apache.org/jira/browse/YARN-6838 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Major > > This JIRA tracks the changes needed to the {{LinuxContainerExecutor}}, > {{LinuxContainerRuntime}}, {{DockerLinuxContainerRuntime}} and the > {{container-executor}} linux binary to support container PAUSE using cgroups > freezer module -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10223) Duplicate jersey-test-framework-core dependency in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079615#comment-17079615 ] Brahma Reddy Battula commented on YARN-10223: - [~aajisaka], are you sure this Jira can be target to 3.3.1, since broken YARN-10101 is in 3.3.0 ..? > Duplicate jersey-test-framework-core dependency in yarn-server-common > - > > Key: YARN-10223 > URL: https://issues.apache.org/jira/browse/YARN-10223 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Minor > > The following warning appears in maven log. > {noformat} > [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must > be unique: > com.sun.jersey.jersey-test-framework:jersey-test-framework-core:jar -> > version (?) vs 1.19 @ line 148, column 17 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument
[ https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079614#comment-17079614 ] Brahma Reddy Battula commented on YARN-10063: - Updated the fixversion as 3.3.0 for branch-3.3 > Usage output of container-executor binary needs to include --http/--https > argument > -- > > Key: YARN-10063 > URL: https://issues.apache.org/jira/browse/YARN-10063 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > Fix For: 3.3.0, 3.4.0 > > Attachments: YARN-10063.001.patch, YARN-10063.002.patch, > YARN-10063.003.patch, YARN-10063.004.patch > > > YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" > (default) and "\--https" that is possible to be passed in to the > container-executor binary, see : > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L564 > and > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L521 > however, the usage output seems to have missed this: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L74 > Raising this jira to improve this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument
[ https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10063: Fix Version/s: (was: 3.3.1) 3.3.0 > Usage output of container-executor binary needs to include --http/--https > argument > -- > > Key: YARN-10063 > URL: https://issues.apache.org/jira/browse/YARN-10063 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > Fix For: 3.3.0, 3.4.0 > > Attachments: YARN-10063.001.patch, YARN-10063.002.patch, > YARN-10063.003.patch, YARN-10063.004.patch > > > YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" > (default) and "\--https" that is possible to be passed in to the > container-executor binary, see : > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L564 > and > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L521 > however, the usage output seems to have missed this: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L74 > Raising this jira to improve this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6911) Graph application-level resource utilization in Web UI v2
[ https://issues.apache.org/jira/browse/YARN-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079603#comment-17079603 ] Hadoop QA commented on YARN-6911: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} YARN-6911 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6911 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25841/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Graph application-level resource utilization in Web UI v2 > - > > Key: YARN-6911 > URL: https://issues.apache.org/jira/browse/YARN-6911 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-ui-v2 >Affects Versions: 3.0.0-alpha4 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Major > Attachments: Resource Graph Screenshot 2.png, Resource Graph > Screenshot.png, Resource Utilization Graph Mock Up.png, YARN-6911.001.patch, > YARN-6911.002.patch, YARN-6911.003.patch, resource graph in web ui v2.png > > > It would be useful to have a visualization of the resource utilization > (memory, cpu, etc.) per application using the ATSv2 time series data. Rough > mock up attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10227) Pull YARN-8242 back to branch-2.10
[ https://issues.apache.org/jira/browse/YARN-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079602#comment-17079602 ] Jim Brennan commented on YARN-10227: Thanks [~epayne]! > Pull YARN-8242 back to branch-2.10 > -- > > Key: YARN-10227 > URL: https://issues.apache.org/jira/browse/YARN-10227 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.10.0, 2.10.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Fix For: 2.10.1 > > Attachments: YARN-10227-branch-2.10.001.patch > > > We have recently seen the nodemanager OOM issue reported in YARN-8242 during > a rolling upgrade. Our code is currently based on branch-2.8, but we are in > the process of moving to 2.10. I checked and YARN-8242 pulls back to > branch-2.10 pretty cleanly. The only conflict was a minor one in > TestNMLeveldbStateStoreService.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6911) Graph application-level resource utilization in Web UI v2
[ https://issues.apache.org/jira/browse/YARN-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079601#comment-17079601 ] Brahma Reddy Battula commented on YARN-6911: Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Graph application-level resource utilization in Web UI v2 > - > > Key: YARN-6911 > URL: https://issues.apache.org/jira/browse/YARN-6911 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-ui-v2 >Affects Versions: 3.0.0-alpha4 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Major > Attachments: Resource Graph Screenshot 2.png, Resource Graph > Screenshot.png, Resource Utilization Graph Mock Up.png, YARN-6911.001.patch, > YARN-6911.002.patch, YARN-6911.003.patch, resource graph in web ui v2.png > > > It would be useful to have a visualization of the resource utilization > (memory, cpu, etc.) per application using the ATSv2 time series data. Rough > mock up attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6911) Graph application-level resource utilization in Web UI v2
[ https://issues.apache.org/jira/browse/YARN-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6911: --- Target Version/s: 3.4.0 (was: 3.3.0) > Graph application-level resource utilization in Web UI v2 > - > > Key: YARN-6911 > URL: https://issues.apache.org/jira/browse/YARN-6911 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-ui-v2 >Affects Versions: 3.0.0-alpha4 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Major > Attachments: Resource Graph Screenshot 2.png, Resource Graph > Screenshot.png, Resource Utilization Graph Mock Up.png, YARN-6911.001.patch, > YARN-6911.002.patch, YARN-6911.003.patch, resource graph in web ui v2.png > > > It would be useful to have a visualization of the resource utilization > (memory, cpu, etc.) per application using the ATSv2 time series data. Rough > mock up attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6812) Consolidate ContainerScheduler maxOpprQueueLength with ContainerQueuingLimit
[ https://issues.apache.org/jira/browse/YARN-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079597#comment-17079597 ] Brahma Reddy Battula edited comment on YARN-6812 at 4/9/20, 5:35 PM: - Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. was (Author: brahmareddy): Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13086602] > Consolidate ContainerScheduler maxOpprQueueLength with ContainerQueuingLimit > - > > Key: YARN-6812 > URL: https://issues.apache.org/jira/browse/YARN-6812 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6812) Consolidate ContainerScheduler maxOpprQueueLength with ContainerQueuingLimit
[ https://issues.apache.org/jira/browse/YARN-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079597#comment-17079597 ] Brahma Reddy Battula commented on YARN-6812: Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13086602] > Consolidate ContainerScheduler maxOpprQueueLength with ContainerQueuingLimit > - > > Key: YARN-6812 > URL: https://issues.apache.org/jira/browse/YARN-6812 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6812) Consolidate ContainerScheduler maxOpprQueueLength with ContainerQueuingLimit
[ https://issues.apache.org/jira/browse/YARN-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-6812: --- Target Version/s: 3.4.0 (was: 3.3.0) > Consolidate ContainerScheduler maxOpprQueueLength with ContainerQueuingLimit > - > > Key: YARN-6812 > URL: https://issues.apache.org/jira/browse/YARN-6812 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10120) In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled
[ https://issues.apache.org/jira/browse/YARN-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079589#comment-17079589 ] Brahma Reddy Battula commented on YARN-10120: - I am going to close this issue, as this merged to 3.3.0 and 3.4.0..if you planing for other branches, please raise seperate Jira. > In Federation Router Nodes/Applications/About pages throws 500 exception when > https is enabled > -- > > Key: YARN-10120 > URL: https://issues.apache.org/jira/browse/YARN-10120 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Reporter: Sushanta Sen >Assignee: Bilwa S T >Priority: Critical > Fix For: 3.3.0, 3.4.0 > > Attachments: YARN-10120-YARN-7402.patch, > YARN-10120-YARN-7402.v2.patch, YARN-10120-addendum-01.patch, > YARN-10120-branch-3.3.patch, YARN-10120-branch-3.3.v2.patch, > YARN-10120.001.patch, YARN-10120.002.patch > > > In Federation Router Nodes/Applications/About pages throws 500 exception when > https is enabled. > yarn.router.webapp.https.address =router ip:8091 > {noformat} > 2020-02-07 16:38:49,990 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/apps > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:166) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) > at > com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1622) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at >