[jira] [Commented] (YARN-9523) Build application catalog docker image as part of hadoop dist build
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832093#comment-16832093 ] Eric Yang commented on YARN-9523: - [~jeagles] no one disputed against using "dist" profile for building docker image according to the email thread. Hence, patch 001 is just revising according to [~ste...@apache.org]'s suggestion. > Build application catalog docker image as part of hadoop dist build > --- > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9523.001.patch > > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9523) Build application catalog docker image as part of hadoop dist build
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9523: Attachment: YARN-9523.001.patch > Build application catalog docker image as part of hadoop dist build > --- > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9523.001.patch > > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures
[ https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832077#comment-16832077 ] Eric Yang commented on YARN-9524: - Test cases look fine, but I still can't get Tracking URL: History to work when accessing job history server. > TestAHSWebServices and TestLogsCLI test case failures > - > > Key: YARN-9524 > URL: https://issues.apache.org/jira/browse/YARN-9524 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: Regression > Attachments: YARN-9524-001.patch, YARN-9524-002.patch > > > {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929. > {code:java} > [ERROR] Failures: > [ERROR] TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014 > [ERROR] Errors: > [ERROR] > TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543 > » NullPointer > [INFO] > [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1 > [ERROR] Failures: > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] Errors: > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j... > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9285) RM UI progress column is of wrong type
[ https://issues.apache.org/jira/browse/YARN-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831938#comment-16831938 ] Eric Payne commented on YARN-9285: -- [~ahussein], branch-3 is not valid branch, so it won't need a patch. However, the trunk patch doesn't backport cleanly to branch-3.0. Can you please provide a patch for branch-3.0? > RM UI progress column is of wrong type > -- > > Key: YARN-9285 > URL: https://issues.apache.org/jira/browse/YARN-9285 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.2, 2.8.6, 2.9.3 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Labels: bug > Attachments: YARN-9285-branch-2.8.001.patch, > YARN-9285-branch-2.8.002.patch, YARN-9285-branch-2.9.001.patch, > YARN-9285-branch-3.001.patch, YARN-9285.001.patch, YARN-9285.002.patch > > > The column type assigned for progress column in the application report is not > correct. > The rank of the progress column should be 16, and 18. In WebPageUtils.java > the "atargets" needs to be incremented by 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9285) RM UI progress column is of wrong type
[ https://issues.apache.org/jira/browse/YARN-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831920#comment-16831920 ] Hudson commented on YARN-9285: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16495 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16495/]) YARN-9285: RM UI progress column is of wrong type. Contributed by Ahmed (ericp: rev b094b94d43a46af9ddb910da24f792b95f614b08) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > RM UI progress column is of wrong type > -- > > Key: YARN-9285 > URL: https://issues.apache.org/jira/browse/YARN-9285 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.2, 2.8.6, 2.9.3 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Labels: bug > Attachments: YARN-9285-branch-2.8.001.patch, > YARN-9285-branch-2.8.002.patch, YARN-9285-branch-2.9.001.patch, > YARN-9285-branch-3.001.patch, YARN-9285.001.patch, YARN-9285.002.patch > > > The column type assigned for progress column in the application report is not > correct. > The rank of the progress column should be 16, and 18. In WebPageUtils.java > the "atargets" needs to be incremented by 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9505) Add container allocation latency for Opportunistic Scheduler
[ https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831916#comment-16831916 ] Íñigo Goiri commented on YARN-9505: --- Thanks [~abmodi] for the fixes. I think we need to have more comments. In particular, {{allocateLatencyOQuantiles}} should have the metric annotation explaining what it represents. It has it on the {{initialize()}} but I'm not sure is the same, in any case it should have the same format as others with "Aggregate # of...". Regarding the test, a few minor comments: * Can we use {{Collecitons#emptyList()}} instead of {{new ArrayList}}? * The test could use some high level comments too. * There are a bunch of things that are used over and over, we could do: ** EXEC_OPPORTUNISTIC = ExecutionTypeRequest.newInstance(ExecutionType.OPPORTUNISTIC, true) ** RESOURCE_1GB = Resources.createResource(1 * GB) ** PRIORITY_1 = Priority.newInstance(1) > Add container allocation latency for Opportunistic Scheduler > > > Key: YARN-9505 > URL: https://issues.apache.org/jira/browse/YARN-9505 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9505.001.patch, YARN-9505.002.patch, > YARN-9505.003.patch > > > This will help in tuning the opportunistic scheduler and it's configuration > parameters. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9285) RM UI progress column is of wrong type
[ https://issues.apache.org/jira/browse/YARN-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831906#comment-16831906 ] Eric Payne commented on YARN-9285: -- +1 LGTM Thanks [~ahussein], will commit shortly. > RM UI progress column is of wrong type > -- > > Key: YARN-9285 > URL: https://issues.apache.org/jira/browse/YARN-9285 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.2, 2.8.6, 2.9.3 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Labels: bug > Attachments: YARN-9285-branch-2.8.001.patch, > YARN-9285-branch-2.8.002.patch, YARN-9285-branch-2.9.001.patch, > YARN-9285-branch-3.001.patch, YARN-9285.001.patch, YARN-9285.002.patch > > > The column type assigned for progress column in the application report is not > correct. > The rank of the progress column should be 16, and 18. In WebPageUtils.java > the "atargets" needs to be incremented by 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
[ https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831903#comment-16831903 ] Jim Brennan commented on YARN-9527: --- I was able to find a node where the problem was actively happening, so I grabbed a heap dump of the nodemanager process and saved off the NM logs. From this, I was able to figure out what was happening. This sequence of events matches several other logs that we have examined. Note that this analysis was done on our internal version of branch-2.8, but based on code inspection, I believe the problem still exists in trunk. *Sequence of events, with relevant logs:* Container transitions from NEW to LOCALIZING {noformat} 2019-04-26 05:24:43,356 [AsyncDispatcher event handler] INFO container.ContainerImpl: Container container_e29_1550394211378_12160590_01_08 transitioned from NEW to LOCALIZING {noformat} * ContainerImpl.RequestResourcesTransition Sends a ContainerLocalizationRequestEvent to ResourceLocalizationService (INIT_CONTAINER_RESOURCES) * ResourceLocalizationService.handleInitContainerResources() Sends ResourceRequestEvent for each LocalResourceRequest to LocalResourcesTrackerImpl (REQUEST) in this case, there are 11 resources *Container transitions from LOCALIZING to KILLING (before we process any of these resources in LocalizerTracker)* {noformat} 2019-04-26 05:24:43,356 [AsyncDispatcher event handler] INFO container.ContainerImpl: Container container_e29_1550394211378_12160590_01_08 transitioned from LOCALIZING to KILLING {noformat} * ContainerImpl.KillDuringLocalizationTransition container.cleanup() collects list of privateRsrcs for this container and send ContainerLocalizationCleanup event * ResourceLocalizationService.handleCleanupContainerResources() ** For each resource, send a ResourceReleaseEvent to LocalResourcesTrackerImpl (RELEASE) ** LocalizerTracker.cleanupPrivLocalizers() (called directly) *** Gets the LocalizerRunner for this container from privLocalizers *Because we have not yet handled any LocalizerResourceRequestEvents for this container, we don’t find a LocalizerRunner, so we just return* ** Deletes the container directories. Sends CONTAINER_RESOURCES_CLEANEDUP event to ContainerImpl LocalResourcesTrackerImpl thread processes event queue * LocalResourcesTrackerImpl.handle Creates new LocalizedResources and adds them to localrsrc map (state is INIT) * LocalizedResource.FetchResourceTransition ** Adds this container to refs ** Sends LocalizerResourceRequestEvent to LocalizerTracker ** State changes to DOWNLOADING {noformat} 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_common_ws-1.2.27.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_common_grid.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_reporting_cdw_common.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/yjava_http_client-0.3.23.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/jcontrib_degrading_stats_util-0.1.17.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_batch_service_client-1.2.16.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/json-smart-1.0.6.3.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/async-http-client-0.3.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_cdw_cow_loader.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO localizer.LocalizedResource: Resource hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/nct.jar transitioned from INIT to DOWNLOADING 2019-04-26 05:24:43,357 [AsyncDispatcher event
[jira] [Commented] (YARN-9526) NM invariably dies if log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831884#comment-16831884 ] ggg commented on YARN-9526: --- [~adam.antal] I'd go one step further, if I may: a more meaningful message would be good, followed by a note that log aggregation is disabled, and not dying. Or possibly falling back to TFile settings (and not dying). > NM invariably dies if log aggregation is enabled > > > Key: YARN-9526 > URL: https://issues.apache.org/jira/browse/YARN-9526 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 > Environment: Binary 3.2.0 hadoop release >Reporter: ggg >Priority: Major > Attachments: nm.log, yarn-site.xml > > > NM dies as soon as first task is scheduled if log aggregation is enabled. Log > attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
[ https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831883#comment-16831883 ] Jim Brennan commented on YARN-9527: --- For example, we recently had a case where all of the disks used by yarn were full: {noformat} Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb4 5776759588 5714378904 4561576 100% /grid/1 /dev/sdd2 5840971776 5775661160 6849008 100% /grid/3 /dev/sdc2 5840971776 5777982304 4527864 100% /grid/2 /dev/sda4 5776759588 5712614448 6326032 100% /grid/0 {noformat} Upon investigation, we found the NM log full of the “Invalid event: LOCALIZED at LOCALIZED” exceptions for a file called creative.data, and we found 2229 copies of that file in the usercache for the user: {noformat} -r-x-- 1 user1 users 441478442 Nov 26 15:07 ./1/19/creative.data -r-x-- 1 user1 users 441478442 Nov 26 15:07 ./1/100014/creative.data -r-x-- 1 user1 users 441478442 Nov 26 15:07 ./1/100024/creative.data -r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100189/creative.data -r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100199/creative.data -r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100214/creative.data -r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100229/creative.data -r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100244/creative.data … {noformat} We had a record of a similar problem reported back in September of 2017. I scanned our clusters to see how often this was happening. On some clusters, there were a significant number of nodes where this “LOCALIZED at LOCALIZED” exception had occurred. For example, on one cluster there were 122 nodes where I found that log message, some nodes with a large number: {noformat} 12566 node585n18: 15053 node585n30: 15819 node262n14: 36182 node582n24: 42623 node585n28: 7 node586n24: 47380 node588n03: 234528 node582n01: 494196 node221n32: 688038 node221n01: 1210223 node1442n30: 1306207 node194n06: 1331739 node1442n21: 1366933 node588n37: 1718461 node583n22: 2050377 node588n33: 2252679 node287n05: {noformat} > Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file > - > > Key: YARN-9527 > URL: https://issues.apache.org/jira/browse/YARN-9527 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.5, 3.1.2 >Reporter: Jim Brennan >Priority: Major > > A rogue ContainerLocalizer can get stuck in a loop continuously downloading > the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" > exception on each iteration. Sometimes this continues long enough that it > fills up a disk or depletes available inodes for the filesystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
Jim Brennan created YARN-9527: - Summary: Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file Key: YARN-9527 URL: https://issues.apache.org/jira/browse/YARN-9527 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.1.2, 2.8.5 Reporter: Jim Brennan A rogue ContainerLocalizer can get stuck in a loop continuously downloading the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" exception on each iteration. Sometimes this continues long enough that it fills up a disk or depletes available inodes for the filesystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9526) NM invariably dies if log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831858#comment-16831858 ] Adam Antal commented on YARN-9526: -- I'm +1 on making a more meaningful message instead of java.util.NoSuchElementException. It would make it much more clear for the end-user if we raise an exception in that case. > NM invariably dies if log aggregation is enabled > > > Key: YARN-9526 > URL: https://issues.apache.org/jira/browse/YARN-9526 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 > Environment: Binary 3.2.0 hadoop release >Reporter: ggg >Priority: Major > Attachments: nm.log, yarn-site.xml > > > NM dies as soon as first task is scheduled if log aggregation is enabled. Log > attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9526) NM invariably dies if log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ggg resolved YARN-9526. --- Resolution: Not A Problem > NM invariably dies if log aggregation is enabled > > > Key: YARN-9526 > URL: https://issues.apache.org/jira/browse/YARN-9526 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 > Environment: Binary 3.2.0 hadoop release >Reporter: ggg >Priority: Major > Attachments: nm.log, yarn-site.xml > > > NM dies as soon as first task is scheduled if log aggregation is enabled. Log > attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9526) NM invariably dies if log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831845#comment-16831845 ] ggg commented on YARN-9526: --- [~Prabhu Joseph], you're spot on in that adding these two properties fixed the issue: {noformat} yarn.log-aggregation.file-formats TFile yarn.log-aggregation.file-controller.TFile.class org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController {noformat} Digging deeper, it turns out I've mistakenly linked (much) older yarn-default.xml to etc/hadoop and that file did not have those properties defined. My bug. Apologies for raising this issue, and many thanks for helping me out here. > NM invariably dies if log aggregation is enabled > > > Key: YARN-9526 > URL: https://issues.apache.org/jira/browse/YARN-9526 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 > Environment: Binary 3.2.0 hadoop release >Reporter: ggg >Priority: Major > Attachments: nm.log, yarn-site.xml > > > NM dies as soon as first task is scheduled if log aggregation is enabled. Log > attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9525) TFile format is not working against s3a remote folder
[ https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831783#comment-16831783 ] Adam Antal commented on YARN-9525: -- The related code: {code:java} Path aggregatedLogFile = null; if (context.isLogAggregationInRolling()) { aggregatedLogFile = initializeWriterInRolling( remoteLogFile, appId, nodeId); } else { aggregatedLogFile = remoteLogFile; fsDataOStream = fc.create(remoteLogFile, EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE), new Options.CreateOpts[] {}); if (uuid == null) { uuid = createUUID(appId); } fsDataOStream.write(uuid); fsDataOStream.flush(); } long aggregatedLogFileLength = fc.getFileStatus( aggregatedLogFile).getLen(); // append a simple character("\n") to move the writer cursor, so // we could get the correct position when we call // fsOutputStream.getStartPos() final byte[] dummyBytes = "\n".getBytes(Charset.forName("UTF-8")); fsDataOStream.write(dummyBytes); fsDataOStream.flush(); if (fsDataOStream.getPos() >= (aggregatedLogFileLength + dummyBytes.length)) { currentOffSet = 0; } else { currentOffSet = aggregatedLogFileLength; } {code} As far as I see the getFileStatus can be omitted - it also used to position the cursor. > TFile format is not working against s3a remote folder > - > > Key: YARN-9525 > URL: https://issues.apache.org/jira/browse/YARN-9525 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.2 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} > configured to an s3a URI throws the following exception during log > aggregation: > {noformat} > Cannot create writer for app application_1556199768861_0001. Skip log upload > this time. > java.io.IOException: java.io.FileNotFoundException: No such file or > directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: No such file or directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195) > ... 7 more > {noformat} > This stack trace point to > {{LogAggregationIndexedFileController$initializeWriter}} where we do the > following steps (in a non-rolling log aggregation setup): > - create FSDataOutputStream > - writing out a UUID > - flushing > - immediately after that we call a
[jira] [Commented] (YARN-9526) NM invariably dies if log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831779#comment-16831779 ] Prabhu Joseph commented on YARN-9526: - [~ppp] This happens if either yarn-default.xml or yarn-site.xml does not have {{yarn.log-aggregation.file-formats}} defined. By default yarn-default.xml part of hadoop-yarn-api-3.2.0.jar has this config defined. Looks there is a yarn-default.xml in NM classpath without this config. Can you try adding below in yarn-site.xml. {code} yarn.log-aggregation.file-formats TFile {code} We can log a meaningful message instead of java.util.NoSuchElementException. > NM invariably dies if log aggregation is enabled > > > Key: YARN-9526 > URL: https://issues.apache.org/jira/browse/YARN-9526 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 > Environment: Binary 3.2.0 hadoop release >Reporter: ggg >Priority: Major > Attachments: nm.log, yarn-site.xml > > > NM dies as soon as first task is scheduled if log aggregation is enabled. Log > attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures
[ https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831780#comment-16831780 ] Hadoop QA commented on YARN-9524: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 4s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 51s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 54s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}122m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9524 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967653/YARN-9524-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 07963a318d63 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6a42745 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24043/testReport/ | | Max. process+thread count | 692 (vs. ulimit of 1) | | modules | C:
[jira] [Updated] (YARN-9526) NM invariably dies if log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ggg updated YARN-9526: -- Attachment: yarn-site.xml > NM invariably dies if log aggregation is enabled > > > Key: YARN-9526 > URL: https://issues.apache.org/jira/browse/YARN-9526 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 > Environment: Binary 3.2.0 hadoop release >Reporter: ggg >Priority: Major > Attachments: nm.log, yarn-site.xml > > > NM dies as soon as first task is scheduled if log aggregation is enabled. Log > attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9526) NM invariably dies if log aggregation is enabled
ggg created YARN-9526: - Summary: NM invariably dies if log aggregation is enabled Key: YARN-9526 URL: https://issues.apache.org/jira/browse/YARN-9526 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 3.2.0 Environment: Binary 3.2.0 hadoop release Reporter: ggg Attachments: nm.log NM dies as soon as first task is scheduled if log aggregation is enabled. Log attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow
[ https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831758#comment-16831758 ] Hadoop QA commented on YARN-9508: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 50s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 135 unchanged - 0 fixed = 137 total (was 135) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 3s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}130m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerOvercommit | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9508 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967649/YARN-9508-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 01861472abb4 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4605db3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24042/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Assigned] (YARN-9525) TFile format is not working against s3a remote folder
[ https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal reassigned YARN-9525: Assignee: Adam Antal > TFile format is not working against s3a remote folder > - > > Key: YARN-9525 > URL: https://issues.apache.org/jira/browse/YARN-9525 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.2 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} > configured to an s3a URI throws the following exception during log > aggregation: > {noformat} > Cannot create writer for app application_1556199768861_0001. Skip log upload > this time. > java.io.IOException: java.io.FileNotFoundException: No such file or > directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: No such file or directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195) > ... 7 more > {noformat} > This stack trace point to > {{LogAggregationIndexedFileController$initializeWriter}} where we do the > following steps (in a non-rolling log aggregation setup): > - create FSDataOutputStream > - writing out a UUID > - flushing > - immediately after that we call a GetFileStatus to get the length of the log > file (the bytes we just wrote out), and that's where the failures happens: > the file is not there yet due to eventual consistency. > Maybe we can get rid of that, so we can use IFile format against a s3a target. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9525) TFile format is not working against s3a remote folder
Adam Antal created YARN-9525: Summary: TFile format is not working against s3a remote folder Key: YARN-9525 URL: https://issues.apache.org/jira/browse/YARN-9525 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 3.1.2 Reporter: Adam Antal Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} configured to an s3a URI throws the following exception during log aggregation: {noformat} Cannot create writer for app application_1556199768861_0001. Skip log upload this time. java.io.IOException: java.io.FileNotFoundException: No such file or directory: s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: No such file or directory: s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128) at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244) at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246) at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195) ... 7 more {noformat} This stack trace point to {{LogAggregationIndexedFileController$initializeWriter}} where we do the following steps (in a non-rolling log aggregation setup): - create FSDataOutputStream - writing out a UUID - flushing - immediately after that we call a GetFileStatus to get the length of the log file (the bytes we just wrote out), and that's where the failures happens: the file is not there yet due to eventual consistency. Maybe we can get rid of that, so we can use IFile format against a s3a target. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures
[ https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9524: Attachment: YARN-9524-002.patch > TestAHSWebServices and TestLogsCLI test case failures > - > > Key: YARN-9524 > URL: https://issues.apache.org/jira/browse/YARN-9524 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: Regression > Attachments: YARN-9524-001.patch, YARN-9524-002.patch > > > {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929. > {code:java} > [ERROR] Failures: > [ERROR] TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014 > [ERROR] Errors: > [ERROR] > TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543 > » NullPointer > [INFO] > [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1 > [ERROR] Failures: > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] Errors: > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j... > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831669#comment-16831669 ] Tao Yang commented on YARN-9440: Thanks [~cheersyang] for the review. Remaining check-style warnings are all about ParameterNumber(More than 7 parameters) which should be required. UT failures seems unrelated to this patch and hard to reproduce them in my local environment. > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9440.001.patch, YARN-9440.002.patch, > YARN-9440.003.patch > > > [Design doc > #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831668#comment-16831668 ] Hadoop QA commented on YARN-9477: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 12s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 46s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | | Hard coded reference to an absolute pathname in
[jira] [Updated] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow
[ https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-9508: Attachment: YARN-9508-001.patch > YarnConfiguration areNodeLabel enabled is costly in allocation flow > --- > > Key: YARN-9508 > URL: https://issues.apache.org/jira/browse/YARN-9508 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Critical > Attachments: YARN-9508-001.patch > > > For every allocate request locking can be avoided. Improving performance > {noformat} > "pool-6-thread-300" #624 prio=5 os_prio=0 tid=0x7f2f91152800 nid=0x8ec5 > waiting for monitor entry [0x7f1ec6a8d000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1674) > at > org.apache.hadoop.yarn.conf.YarnConfiguration.areNodeLabelsEnabled(YarnConfiguration.java:3646) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:274) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:261) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:242) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:427) > - locked <0x7f24dd3f9e40> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212) > at > org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures
[ https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831651#comment-16831651 ] Prabhu Joseph commented on YARN-9524: - [~eyang] Can you review this Jira when you get time. Thanks. > TestAHSWebServices and TestLogsCLI test case failures > - > > Key: YARN-9524 > URL: https://issues.apache.org/jira/browse/YARN-9524 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: Regression > Attachments: YARN-9524-001.patch > > > {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929. > {code:java} > [ERROR] Failures: > [ERROR] TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014 > [ERROR] Errors: > [ERROR] > TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543 > » NullPointer > [INFO] > [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1 > [ERROR] Failures: > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] Errors: > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j... > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831631#comment-16831631 ] Peter Bacsko commented on YARN-9477: [~snemeth] could you please check patch v1? > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-001.patch, YARN-9477-POC.patch, > YARN-9477-POC2.patch, YARN-9477-POC3.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831605#comment-16831605 ] Peter Bacsko edited comment on YARN-9477 at 5/2/19 1:09 PM: I uploaded the first version of this. I immediately realized two things: # {{filter(p -> p.getFileName().startsWith("veslot"))}} - this doesn't work properly, needs to be changed # {{assertTrue("Device should not be healthy", device.isHealthy())}} - incorrect assertion message in line 225 was (Author: pbacsko): I uploaded the first version of this. I immediately realized two things: # {{filter(p -> p.getFileName().startsWith("veslot")) - }}this doesn't work properly, needs to be changed # {{assertTrue("Device should not be healthy", device.isHealthy())}} - incorrect assertion message in line 225 > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-001.patch, YARN-9477-POC.patch, > YARN-9477-POC2.patch, YARN-9477-POC3.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831605#comment-16831605 ] Peter Bacsko commented on YARN-9477: I uploaded the first version of this. I immediately realized two things: # {{filter(p -> p.getFileName().startsWith("veslot")) - }}this doesn't work properly, needs to be changed # {{assertTrue("Device should not be healthy", device.isHealthy())}} - incorrect assertion message in line 225 > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-001.patch, YARN-9477-POC.patch, > YARN-9477-POC2.patch, YARN-9477-POC3.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9477: --- Attachment: YARN-9477-001.patch > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-001.patch, YARN-9477-POC.patch, > YARN-9477-POC2.patch, YARN-9477-POC3.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures
[ https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831595#comment-16831595 ] Hadoop QA commented on YARN-9524: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 156 unchanged - 0 fixed = 157 total (was 156) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 50s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 8s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}105m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9524 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967621/YARN-9524-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a77445fded09 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e2f0f72 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Updated] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures
[ https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9524: Attachment: YARN-9524-001.patch > TestAHSWebServices and TestLogsCLI test case failures > - > > Key: YARN-9524 > URL: https://issues.apache.org/jira/browse/YARN-9524 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: Regression > Attachments: YARN-9524-001.patch > > > {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929. > {code:java} > [ERROR] Failures: > [ERROR] TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014 > [ERROR] Errors: > [ERROR] > TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543 > » NullPointer > [INFO] > [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1 > [ERROR] Failures: > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] Errors: > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j... > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures
[ https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831542#comment-16831542 ] Prabhu Joseph commented on YARN-9524: - YARN-6929 has caused two issues: 1. Test Case Failure of {{TestLogsCLI}} - this is due to wrong paths in testcase, have changed paths from older to new app log dir structure. 2. Test Case Failure of {{TestAHSWebServices}}. This is a regression which affects ATS (both 1.5 and 2) {{LogWebService}} and {{AHSWebServices}} rest api while fetching the logs from older app log dir. {code:java} [yarn@yarn-ats-1 ~]$ curl -s "http://yarn-ats-1:8198/ws/v2/applicationlog/containers/container_1556376250086_0006_01_01/logs; {"exception":"WebApplicationException","message":"java.io.IOException: Can not create a Path from a null string\nCan not find remote application directory for the application:application_1556376250086_0006\n","javaClassName":"javax.ws.rs.WebApplicationException"}[yarn@yarn-ats-1 ~]$ {code} When the Rest Api does not pass the appOwner, the {{LogAggregationUtils}} has to guess the job user by setting * in Path which have missed in YARN-6929 for Older app log dir structure. Have fixed the same. {code:java} [yarn@yarn-ats-3 ~]$ curl -s "http://yarn-ats-1:8198/ws/v2/applicationlog/containers/container_1556376250086_0006_01_01/logs; [{"containerLogInfo":[{"fileName":"AppMaster.stderr","fileSize":"4158","lastModifiedTime":"Thu May 02 09:54:58 + 2019"},{"fileName":"AppMaster.stdout","fileSize":"0","lastModifiedTime":"Thu May 02 09:54:58 + 2019"},{"fileName":"directory.info","fileSize":"2125","lastModifiedTime":"Thu May 02 09:54:58 + 2019"},{"fileName":"launch_container.sh","fileSize":"4767","lastModifiedTime":"Thu May 02 09:54:58 + 2019"},{"fileName":"prelaunch.err","fileSize":"0","lastModifiedTime":"Thu May 02 09:54:58 + 2019"},{"fileName":"prelaunch.out","fileSize":"100","lastModifiedTime":"Thu May 02 09:54:58 + 2019"}],"logAggregationType":"AGGREGATED","containerId":"container_1556376250086_0006_01_01","nodeId":"yarn-ats-2_45454"}] {code} Verified all testcases related to LogAggregation and functionality - Log Aggregation, Deletion, Yarn Logs Cli, HistoryServer, ATS Logs Web Service. > TestAHSWebServices and TestLogsCLI test case failures > - > > Key: YARN-9524 > URL: https://issues.apache.org/jira/browse/YARN-9524 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929. > {code:java} > [ERROR] Failures: > [ERROR] TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014 > [ERROR] Errors: > [ERROR] > TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543 > » NullPointer > [INFO] > [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1 > [ERROR] Failures: > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] Errors: > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j... > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures
[ https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9524: Labels: Regression (was: ) > TestAHSWebServices and TestLogsCLI test case failures > - > > Key: YARN-9524 > URL: https://issues.apache.org/jira/browse/YARN-9524 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: Regression > > {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929. > {code:java} > [ERROR] Failures: > [ERROR] TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014 > [ERROR] Errors: > [ERROR] > TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543 > » NullPointer > [INFO] > [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1 > [ERROR] Failures: > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] TestAHSWebServices.testContainerLogsForFinishedApps:624 > [ERROR] Errors: > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j... > [ERROR] TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » > WebApplication j...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org