[jira] [Commented] (YARN-10422) Create the script responsible for collecting the bundle data
[ https://issues.apache.org/jira/browse/YARN-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212248#comment-17212248 ] Hadoop QA commented on YARN-10422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s{color} | | {color:red} YARN-10422 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-10422 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13013317/YARN-10422.POC.002.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/227/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Create the script responsible for collecting the bundle data > > > Key: YARN-10422 > URL: https://issues.apache.org/jira/browse/YARN-10422 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10422.POC.001.patch, YARN-10422.POC.002.patch > > > The script should provide the list of diagnostic use-cases described in > YARN-10421. If a request comes in to the YarnDiagnosticCollector servlet, the > script will be invoked. It collects all the information required for that > diagnostic category and saves it into a configurable directory as a > compressed tar file. > An example of how the script could look like: > {code:java} > if [$1 = "listcommonissues"] > echo "1, Application Failed" > echo "2, Application Hanging" > echo "3, Scheduler Related Issue" > echo "4, RM failure to start" > echo "5, NM failure to start" > elif [$1 = "collect"] > if [$2 == 1] > appId = $3 > mkdir /tmp/$appId > yarn logs -applicationId $appId > /tmp/$appId/joblogs > curl /{appId}/conf > /tmp/$appId/conf > curl /logs | grep container > /tmp/$appId/rmlogs > curl /logs | grep container > /tmp/$appId/nmlogs > outputpath = /tmp/$appId > elif ... > elif ... > fi tar and compress outputpath.{code} > > During class load YarnDiagnosticsCollector reads the list of common issues > from the script and keeps it in memory. On every startup of YARN UI2 > diagnostics page, it fetches the list from the servlet and displays them. The > servlet should handle the script changes, so if a new diagnostic case is > added, a YARN UI2 reload should show it. This way the users can easily plug > new categories without any UI2 or Servlet code change. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10431) [Umbrella] Job group management
[ https://issues.apache.org/jira/browse/YARN-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-10431: --- Attachment: (was: YarnJobObjectImpl Design.pdf) > [Umbrella] Job group management > --- > > Key: YARN-10431 > URL: https://issues.apache.org/jira/browse/YARN-10431 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.9.2 >Reporter: jialei weng >Priority: Major > > In current yarn job management, we don't have an efficient mechanism to > manage several jobs together. For example, one batch job may trigger several > sub-jobs to running at the same time, like one job to process the data and > another one monitor job metrics. And when we want to cancel these jobs, we > have to kill them one by one in current design. I proposal a job group > concept to handle such parent-child jobs as one unit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10431) [Umbrella] Job group management
[ https://issues.apache.org/jira/browse/YARN-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-10431: --- Attachment: YarnJobObjectImpl Design.pdf > [Umbrella] Job group management > --- > > Key: YARN-10431 > URL: https://issues.apache.org/jira/browse/YARN-10431 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.9.2 >Reporter: jialei weng >Priority: Major > > In current yarn job management, we don't have an efficient mechanism to > manage several jobs together. For example, one batch job may trigger several > sub-jobs to running at the same time, like one job to process the data and > another one monitor job metrics. And when we want to cancel these jobs, we > have to kill them one by one in current design. I proposal a job group > concept to handle such parent-child jobs as one unit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10431) [Umbrella] Job group management
[ https://issues.apache.org/jira/browse/YARN-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-10431: --- Attachment: YarnJobGroupImpl design.pdf > [Umbrella] Job group management > --- > > Key: YARN-10431 > URL: https://issues.apache.org/jira/browse/YARN-10431 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.9.2 >Reporter: jialei weng >Priority: Major > Attachments: YarnJobGroupImpl design.pdf > > > In current yarn job management, we don't have an efficient mechanism to > manage several jobs together. For example, one batch job may trigger several > sub-jobs to running at the same time, like one job to process the data and > another one monitor job metrics. And when we want to cancel these jobs, we > have to kill them one by one in current design. I proposal a job group > concept to handle such parent-child jobs as one unit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10457) Add a configuration switch to change between legacy and JSON placement rule format.
Gergely Pollak created YARN-10457: - Summary: Add a configuration switch to change between legacy and JSON placement rule format. Key: YARN-10457 URL: https://issues.apache.org/jira/browse/YARN-10457 Project: Hadoop YARN Issue Type: Sub-task Reporter: Gergely Pollak -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10457) Add a configuration switch to change between legacy and JSON placement rule format.
[ https://issues.apache.org/jira/browse/YARN-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Pollak reassigned YARN-10457: - Assignee: Gergely Pollak > Add a configuration switch to change between legacy and JSON placement rule > format. > --- > > Key: YARN-10457 > URL: https://issues.apache.org/jira/browse/YARN-10457 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10420) Update CS MappingRule documentation with the new format and features
[ https://issues.apache.org/jira/browse/YARN-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212411#comment-17212411 ] Adam Antal commented on YARN-10420: --- Thanks for the patch [~pbacsko]. I'll attach my reply inline. 1. Ok, let's not touch it then. 2. Can we check what happens, and document it as well? I think users would be also interested in that. 3. Can we also add this to the document? 4,5,6. Ok, got it, thanks. bq. "If the target queue doesn't exist or and it cannot be created..." - you propose "and" but that would mean that we always try to create a non-existing queue, which is not the case in CS. Under regular parents, queues cannot be created dynamically and CS doesn't even try. Therefore "or" is more appropriate here. Thanks for the clarification. I suggest to add this to the doc, because I didn't know that "cannot be created" is what you've illustrated as an example. Something like "If the target queue doesn't exist or cannot be created (e.g. under regular parents) ..." For all the other points, I'm fine. > Update CS MappingRule documentation with the new format and features > > > Key: YARN-10420 > URL: https://issues.apache.org/jira/browse/YARN-10420 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10420-001.patch, YARN-10420-002.patch, > YARN-10420-003.patch, YARN-10420-004.patch, YARN-10420-005.patch > > > Update the upstream documentation with the new changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10448) SLS should set default user to handle SYNTH format
[ https://issues.apache.org/jira/browse/YARN-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212423#comment-17212423 ] Adam Antal commented on YARN-10448: --- Thanks for the patch [~zhuqi], looks good to me. Could you please double check that the unit tests failures are related? Also there's one checkstyle warning remained. I can commit this if you take care of that. > SLS should set default user to handle SYNTH format > -- > > Key: YARN-10448 > URL: https://issues.apache.org/jira/browse/YARN-10448 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.2.1, 3.4.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10448.001.patch, YARN-10448.002.patch, > YARN-10448.003.patch, image-2020-10-11-22-01-37-227.png, > image-2020-10-11-22-02-17-166.png > > > When using the synthetic generator json file example from the doc ( > https://hadoop.apache.org/docs/current/hadoop-sls/SchedulerLoadSimulator.html#SYNTH_JSON_input_file_format > ), it throws the following exception: > {noformat} > java.lang.IllegalArgumentException: Null user > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1269) > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1256) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.submitReservationWhenSpecified(AMSimulator.java:191) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:161) > at > org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:88) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > {noformat} > So the solution is either: > 1) to make {{user_name}} a mandatory field, or > 2) to set default user in SLS code if the json file does not define it. > IMO, solution 2 might be better, because in most cases (if not all) > {{user_name}} has no impact on scheduler performance, thus it is reasonable > to make it an optional field, which is also consistent with the {{job.user}} > field in SLS JSON file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10448) SLS should set default user to handle SYNTH format
[ https://issues.apache.org/jira/browse/YARN-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212437#comment-17212437 ] zhuqi commented on YARN-10448: -- CC [~adam.antal] Thanks for your patient review and commit. The unit tests failures are not related to it, and i have fixed the checkstyle warning in the new patch. > SLS should set default user to handle SYNTH format > -- > > Key: YARN-10448 > URL: https://issues.apache.org/jira/browse/YARN-10448 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.2.1, 3.4.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10448.001.patch, YARN-10448.002.patch, > YARN-10448.003.patch, YARN-10448.004.patch, > image-2020-10-11-22-01-37-227.png, image-2020-10-11-22-02-17-166.png > > > When using the synthetic generator json file example from the doc ( > https://hadoop.apache.org/docs/current/hadoop-sls/SchedulerLoadSimulator.html#SYNTH_JSON_input_file_format > ), it throws the following exception: > {noformat} > java.lang.IllegalArgumentException: Null user > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1269) > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1256) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.submitReservationWhenSpecified(AMSimulator.java:191) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:161) > at > org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:88) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > {noformat} > So the solution is either: > 1) to make {{user_name}} a mandatory field, or > 2) to set default user in SLS code if the json file does not define it. > IMO, solution 2 might be better, because in most cases (if not all) > {{user_name}} has no impact on scheduler performance, thus it is reasonable > to make it an optional field, which is also consistent with the {{job.user}} > field in SLS JSON file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9667) Container-executor.c duplicates messages to stdout
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-9667: -- Fix Version/s: 2.10.2 > Container-executor.c duplicates messages to stdout > -- > > Key: YARN-9667 > URL: https://issues.apache.org/jira/browse/YARN-9667 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.5, 2.10.2 > > Attachments: YARN-9667-001.patch, YARN-9667-branch-2.10.001.patch, > YARN-9667-branch-3.2.001.patch > > > When a container is killed by its AM we get a similar error message like this: > {noformat} > 2019-06-30 12:09:04,412 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 143. Privileged Execution Operation > Stderr: > Stdout: main : command provided 1 > main : run as user is systest > main : requested yarn user is systest > Getting exit code file... > Creating script paths... > Writing pid file... > Writing to tmp file > /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp > Writing to cgroup task files... > Creating local dirs... > Launching container... > Getting exit code file... > Creating script paths... > {noformat} > In container-executor.c the fork point is right after the "Creating script > paths..." part, though in the Stdout log we can clearly see it has been > written there twice. After consulting with [~pbacsko] it seems like there's a > missing flush in container-executor.c before the fork and that causes the > duplication. > I suggest to add a flush there so that it won't be duplicated: it's a bit > misleading that the child process writes out "Getting exit code file" and > "Creating script paths" even though it is clearly not doing that. > A more appealing solution could be to revisit the fprintf-fflush pairs in the > code and change them to a single call, so that the fflush calls would not be > forgotten accidentally. (It can cause problems in every place where it's > used). > Note: this issue probably affects every occasion of fork(), not just the one > from {{launch_container_as_user}} in {{main.c}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10448) SLS should set default user to handle SYNTH format
[ https://issues.apache.org/jira/browse/YARN-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212476#comment-17212476 ] Hadoop QA commented on YARN-10448: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 12s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 23s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 22s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 47s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 38s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 54s
[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212524#comment-17212524 ] Eric Badger commented on YARN-9667: --- Thanks, [~Jim_Brennan]! > Container-executor.c duplicates messages to stdout > -- > > Key: YARN-9667 > URL: https://issues.apache.org/jira/browse/YARN-9667 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.5, 2.10.2 > > Attachments: YARN-9667-001.patch, YARN-9667-branch-2.10.001.patch, > YARN-9667-branch-3.2.001.patch > > > When a container is killed by its AM we get a similar error message like this: > {noformat} > 2019-06-30 12:09:04,412 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 143. Privileged Execution Operation > Stderr: > Stdout: main : command provided 1 > main : run as user is systest > main : requested yarn user is systest > Getting exit code file... > Creating script paths... > Writing pid file... > Writing to tmp file > /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp > Writing to cgroup task files... > Creating local dirs... > Launching container... > Getting exit code file... > Creating script paths... > {noformat} > In container-executor.c the fork point is right after the "Creating script > paths..." part, though in the Stdout log we can clearly see it has been > written there twice. After consulting with [~pbacsko] it seems like there's a > missing flush in container-executor.c before the fork and that causes the > duplication. > I suggest to add a flush there so that it won't be duplicated: it's a bit > misleading that the child process writes out "Getting exit code file" and > "Creating script paths" even though it is clearly not doing that. > A more appealing solution could be to revisit the fprintf-fflush pairs in the > code and change them to a single call, so that the fflush calls would not be > forgotten accidentally. (It can cause problems in every place where it's > used). > Note: this issue probably affects every occasion of fork(), not just the one > from {{launch_container_as_user}} in {{main.c}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10422) Create the script responsible for collecting the bundle data
[ https://issues.apache.org/jira/browse/YARN-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10422: - Attachment: YARN-10422.POC.003.patch > Create the script responsible for collecting the bundle data > > > Key: YARN-10422 > URL: https://issues.apache.org/jira/browse/YARN-10422 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10422.POC.001.patch, YARN-10422.POC.002.patch, > YARN-10422.POC.003.patch > > > The script should provide the list of diagnostic use-cases described in > YARN-10421. If a request comes in to the YarnDiagnosticCollector servlet, the > script will be invoked. It collects all the information required for that > diagnostic category and saves it into a configurable directory as a > compressed tar file. > An example of how the script could look like: > {code:java} > if [$1 = "listcommonissues"] > echo "1, Application Failed" > echo "2, Application Hanging" > echo "3, Scheduler Related Issue" > echo "4, RM failure to start" > echo "5, NM failure to start" > elif [$1 = "collect"] > if [$2 == 1] > appId = $3 > mkdir /tmp/$appId > yarn logs -applicationId $appId > /tmp/$appId/joblogs > curl /{appId}/conf > /tmp/$appId/conf > curl /logs | grep container > /tmp/$appId/rmlogs > curl /logs | grep container > /tmp/$appId/nmlogs > outputpath = /tmp/$appId > elif ... > elif ... > fi tar and compress outputpath.{code} > > During class load YarnDiagnosticsCollector reads the list of common issues > from the script and keeps it in memory. On every startup of YARN UI2 > diagnostics page, it fetches the list from the servlet and displays them. The > servlet should handle the script changes, so if a new diagnostic case is > added, a YARN UI2 reload should show it. This way the users can easily plug > new categories without any UI2 or Servlet code change. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212547#comment-17212547 ] Jim Brennan commented on YARN-10450: Anyone else available to review? [~jhung], [~ebadger] ? > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212557#comment-17212557 ] Eric Badger commented on YARN-10450: I'll review it > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212632#comment-17212632 ] Eric Badger commented on YARN-10450: The patch itself looks good to me. However, I'm wondering if "Mem Utilization" is the correct phrase to convey what we mean. To me this means "Mem Used" / "Mem Avail". But in this case it's the actual utilization of the node. And "Mem Used" isn't really the actual memory that's being used. It's the memory that is allocated to that node via YARN. [~Jim_Brennan], [~epayne] do you have any thoughts on making this terminology a little more clear on the UI? > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10422) Create the script responsible for collecting the bundle data
[ https://issues.apache.org/jira/browse/YARN-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212650#comment-17212650 ] Hadoop QA commented on YARN-10422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 6s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 32s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 4s{color} | [/diff-pylint.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/229/artifact/out/diff-pylint.txt] | {color:orange} The patch generated 42 new + 0 unchanged - 0 fixed = 42 total (was 0) {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 15m 44s{color} | [/patch-shadedclient.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/229/artifact/out/patch-shadedclient.txt] | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | || || || || {color:brown} Other Tests {color} || || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 2s{color} | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/229/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt] | {color:red} hadoop-yarn
[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212662#comment-17212662 ] Jim Brennan commented on YARN-10450: Thanks for the review and comments [~ebadger]! I agree the names could be clearer. I'm not sure if we should change *Mem Used* because even though I agree it could be more accurate, it has been called that for a long time. I'm definitely open to changing the name for *Mem Utilization %*, which in the Cluster Metrics is the actual memory utilization percentage for all nodes in the cluster, and in the Node Metrics it's the actual memory utilization percentage for the node. Maybe it should be something like *Physical Mem Used %* / *Physical VCores Used %*? [~epayne], [~jhung] what do you think? > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212676#comment-17212676 ] Jonathan Hung commented on YARN-10450: -- [~Jim_Brennan], Physical Mem Used % makes sense to me. We also refer to this as "Memory Efficiency" internally. > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org