[jira] [Commented] (YARN-9523) Build application catalog docker image as part of hadoop dist build

2019-05-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832093#comment-16832093
 ] 

Eric Yang commented on YARN-9523:
-

[~jeagles] no one disputed against using "dist" profile for building docker 
image according to the email thread.  Hence, patch 001 is just revising 
according to [~ste...@apache.org]'s suggestion.

> Build application catalog docker image as part of hadoop dist build
> ---
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9523.001.patch
>
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9523) Build application catalog docker image as part of hadoop dist build

2019-05-02 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9523:

Attachment: YARN-9523.001.patch

> Build application catalog docker image as part of hadoop dist build
> ---
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9523.001.patch
>
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures

2019-05-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832077#comment-16832077
 ] 

Eric Yang commented on YARN-9524:
-

Test cases look fine, but I still can't get Tracking URL: History to work when 
accessing job history server.

> TestAHSWebServices and TestLogsCLI test case failures
> -
>
> Key: YARN-9524
> URL: https://issues.apache.org/jira/browse/YARN-9524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: Regression
> Attachments: YARN-9524-001.patch, YARN-9524-002.patch
>
>
> {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929.
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014
> [ERROR] Errors: 
> [ERROR]   
> TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543
>  » NullPointer
> [INFO] 
> [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1
> [ERROR] Failures: 
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR] Errors: 
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9285) RM UI progress column is of wrong type

2019-05-02 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831938#comment-16831938
 ] 

Eric Payne commented on YARN-9285:
--

[~ahussein], branch-3 is not valid branch, so it won't need a patch. However, 
the trunk patch doesn't backport cleanly to branch-3.0. Can you please provide 
a patch for branch-3.0?

> RM UI progress column is of wrong type
> --
>
> Key: YARN-9285
> URL: https://issues.apache.org/jira/browse/YARN-9285
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 2.8.6, 2.9.3
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>  Labels: bug
> Attachments: YARN-9285-branch-2.8.001.patch, 
> YARN-9285-branch-2.8.002.patch, YARN-9285-branch-2.9.001.patch, 
> YARN-9285-branch-3.001.patch, YARN-9285.001.patch, YARN-9285.002.patch
>
>
> The column type assigned for progress column in the application report is not 
> correct.
> The rank of the progress column should be 16, and 18. In WebPageUtils.java 
> the "atargets" needs to be incremented by 1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9285) RM UI progress column is of wrong type

2019-05-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831920#comment-16831920
 ] 

Hudson commented on YARN-9285:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16495 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16495/])
YARN-9285: RM UI progress column is of wrong type. Contributed by  Ahmed 
(ericp: rev b094b94d43a46af9ddb910da24f792b95f614b08)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java


> RM UI progress column is of wrong type
> --
>
> Key: YARN-9285
> URL: https://issues.apache.org/jira/browse/YARN-9285
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 2.8.6, 2.9.3
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>  Labels: bug
> Attachments: YARN-9285-branch-2.8.001.patch, 
> YARN-9285-branch-2.8.002.patch, YARN-9285-branch-2.9.001.patch, 
> YARN-9285-branch-3.001.patch, YARN-9285.001.patch, YARN-9285.002.patch
>
>
> The column type assigned for progress column in the application report is not 
> correct.
> The rank of the progress column should be 16, and 18. In WebPageUtils.java 
> the "atargets" needs to be incremented by 1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9505) Add container allocation latency for Opportunistic Scheduler

2019-05-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831916#comment-16831916
 ] 

Íñigo Goiri commented on YARN-9505:
---

Thanks [~abmodi] for the fixes.
I think we need to have more comments.
In particular, {{allocateLatencyOQuantiles}} should have the metric annotation 
explaining what it represents.
It has it on the {{initialize()}} but I'm not sure is the same, in any case it 
should have the same format as others with "Aggregate # of...".

Regarding the test, a few minor comments:
* Can we use {{Collecitons#emptyList()}} instead of {{new ArrayList}}?
* The test could use some high level comments too.
* There are a bunch of things that are used over and over, we could do:
** EXEC_OPPORTUNISTIC = 
ExecutionTypeRequest.newInstance(ExecutionType.OPPORTUNISTIC, true)
** RESOURCE_1GB = Resources.createResource(1 * GB)
** PRIORITY_1 = Priority.newInstance(1)

> Add container allocation latency for Opportunistic Scheduler
> 
>
> Key: YARN-9505
> URL: https://issues.apache.org/jira/browse/YARN-9505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9505.001.patch, YARN-9505.002.patch, 
> YARN-9505.003.patch
>
>
> This will help in tuning the opportunistic scheduler and it's configuration 
> parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9285) RM UI progress column is of wrong type

2019-05-02 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831906#comment-16831906
 ] 

Eric Payne commented on YARN-9285:
--

+1 LGTM
Thanks [~ahussein], will commit shortly.

> RM UI progress column is of wrong type
> --
>
> Key: YARN-9285
> URL: https://issues.apache.org/jira/browse/YARN-9285
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 2.8.6, 2.9.3
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>  Labels: bug
> Attachments: YARN-9285-branch-2.8.001.patch, 
> YARN-9285-branch-2.8.002.patch, YARN-9285-branch-2.9.001.patch, 
> YARN-9285-branch-3.001.patch, YARN-9285.001.patch, YARN-9285.002.patch
>
>
> The column type assigned for progress column in the application report is not 
> correct.
> The rank of the progress column should be 16, and 18. In WebPageUtils.java 
> the "atargets" needs to be incremented by 1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file

2019-05-02 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831903#comment-16831903
 ] 

Jim Brennan commented on YARN-9527:
---

I was able to find a node where the problem was actively happening, so I 
grabbed a heap dump of the nodemanager process and saved off the NM logs. From 
this, I was able to figure out what was happening. This sequence of events 
matches several other logs that we have examined.  Note that this analysis was 
done on our internal version of branch-2.8, but based on code inspection, I 
believe the problem still exists in trunk.

*Sequence of events, with relevant logs:*

Container transitions from NEW to LOCALIZING
{noformat}
2019-04-26 05:24:43,356 [AsyncDispatcher event handler] INFO 
container.ContainerImpl: Container 
container_e29_1550394211378_12160590_01_08 transitioned from NEW to 
LOCALIZING
{noformat}
 * ContainerImpl.RequestResourcesTransition
 Sends a ContainerLocalizationRequestEvent to ResourceLocalizationService 
(INIT_CONTAINER_RESOURCES)
 * ResourceLocalizationService.handleInitContainerResources()
 Sends ResourceRequestEvent for each LocalResourceRequest to 
LocalResourcesTrackerImpl (REQUEST)
 in this case, there are 11 resources

*Container transitions from LOCALIZING to KILLING (before we process any of 
these resources in LocalizerTracker)*
{noformat}
2019-04-26 05:24:43,356 [AsyncDispatcher event handler] INFO 
container.ContainerImpl: Container 
container_e29_1550394211378_12160590_01_08 transitioned from LOCALIZING to 
KILLING
{noformat}
 * ContainerImpl.KillDuringLocalizationTransition
 container.cleanup()
 collects list of privateRsrcs for this container and send 
ContainerLocalizationCleanup event
 * ResourceLocalizationService.handleCleanupContainerResources()
 ** For each resource, send a ResourceReleaseEvent to LocalResourcesTrackerImpl 
(RELEASE)
 ** LocalizerTracker.cleanupPrivLocalizers() (called directly)

 *** Gets the LocalizerRunner for this container from privLocalizers
 *Because we have not yet handled any LocalizerResourceRequestEvents for this 
container, we don’t find a LocalizerRunner, so we just return*
 ** Deletes the container directories.
 Sends CONTAINER_RESOURCES_CLEANEDUP event to ContainerImpl

LocalResourcesTrackerImpl thread processes event queue
 * LocalResourcesTrackerImpl.handle
 Creates new LocalizedResources and adds them to localrsrc map (state is INIT)
 * LocalizedResource.FetchResourceTransition
 ** Adds this container to refs
 ** Sends LocalizerResourceRequestEvent to LocalizerTracker
 ** State changes to DOWNLOADING

{noformat}
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_common_ws-1.2.27.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_common_grid.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_reporting_cdw_common.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/yjava_http_client-0.3.23.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/jcontrib_degrading_stats_util-0.1.17.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_batch_service_client-1.2.16.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/json-smart-1.0.6.3.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/async-http-client-0.3.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/na_cdw_cow_loader.jar
 transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event handler] INFO 
localizer.LocalizedResource: Resource 
hdfs://nn1:8020/projects/proj1/workflows/nct/tesla_dim_1h/lib/nct.jar 
transitioned from INIT to DOWNLOADING
2019-04-26 05:24:43,357 [AsyncDispatcher event 

[jira] [Commented] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread ggg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831884#comment-16831884
 ] 

ggg commented on YARN-9526:
---

[~adam.antal] I'd go one step further, if I may: a more meaningful message 
would be good, followed by a note that log aggregation is disabled, and not 
dying.  Or possibly falling back to TFile settings (and not dying).

> NM invariably dies if log aggregation is enabled
> 
>
> Key: YARN-9526
> URL: https://issues.apache.org/jira/browse/YARN-9526
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
> Environment: Binary 3.2.0 hadoop release
>Reporter: ggg
>Priority: Major
> Attachments: nm.log, yarn-site.xml
>
>
> NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
> attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file

2019-05-02 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831883#comment-16831883
 ] 

Jim Brennan commented on YARN-9527:
---

For example, we recently had a case where all of the disks used by yarn were 
full:
{noformat}
Filesystem  1K-blocks   Used Available Use% Mounted on
/dev/sdb4  5776759588 5714378904   4561576 100% /grid/1
/dev/sdd2  5840971776 5775661160   6849008 100% /grid/3
/dev/sdc2  5840971776 5777982304   4527864 100% /grid/2
/dev/sda4  5776759588 5712614448   6326032 100% /grid/0
{noformat}
Upon investigation, we found the NM log full of the “Invalid event: LOCALIZED 
at LOCALIZED” exceptions for a file called creative.data, and we found 2229 
copies of that file in the usercache for the user:
{noformat}
-r-x-- 1 user1 users 441478442 Nov 26 15:07 ./1/19/creative.data
-r-x-- 1 user1 users 441478442 Nov 26 15:07 ./1/100014/creative.data
-r-x-- 1 user1 users 441478442 Nov 26 15:07 ./1/100024/creative.data
-r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100189/creative.data
-r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100199/creative.data
-r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100214/creative.data
-r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100229/creative.data
-r-x-- 1 user1 users 441478442 Nov 26 15:08 ./1/100244/creative.data
…
{noformat}
We had a record of a similar problem reported back in September of 2017.
 I scanned our clusters to see how often this was happening. On some clusters, 
there were a significant number of nodes where this “LOCALIZED at LOCALIZED” 
exception had occurred. For example, on one cluster there were 122 nodes where 
I found that log message, some nodes with a large number:
{noformat}
  12566 node585n18:
  15053 node585n30:
  15819 node262n14:
  36182 node582n24:
  42623 node585n28:
  7 node586n24:
  47380 node588n03:
 234528 node582n01:
 494196 node221n32:
 688038 node221n01:
1210223 node1442n30:
1306207 node194n06:
1331739 node1442n21:
1366933 node588n37:
1718461 node583n22:
2050377 node588n33:
2252679 node287n05:
{noformat}

> Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
> -
>
> Key: YARN-9527
> URL: https://issues.apache.org/jira/browse/YARN-9527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.5, 3.1.2
>Reporter: Jim Brennan
>Priority: Major
>
> A rogue ContainerLocalizer can get stuck in a loop continuously downloading 
> the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" 
> exception on each iteration.  Sometimes this continues long enough that it 
> fills up a disk or depletes available inodes for the filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file

2019-05-02 Thread Jim Brennan (JIRA)
Jim Brennan created YARN-9527:
-

 Summary: Rogue LocalizerRunner/ContainerLocalizer repeatedly 
downloading same file
 Key: YARN-9527
 URL: https://issues.apache.org/jira/browse/YARN-9527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.1.2, 2.8.5
Reporter: Jim Brennan


A rogue ContainerLocalizer can get stuck in a loop continuously downloading the 
same file while generating an "Invalid event: LOCALIZED at LOCALIZED" exception 
on each iteration.  Sometimes this continues long enough that it fills up a 
disk or depletes available inodes for the filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831858#comment-16831858
 ] 

Adam Antal commented on YARN-9526:
--

I'm +1 on making a more meaningful message instead of 
java.util.NoSuchElementException. It would make it much more clear for the 
end-user if we raise an exception in that case.

> NM invariably dies if log aggregation is enabled
> 
>
> Key: YARN-9526
> URL: https://issues.apache.org/jira/browse/YARN-9526
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
> Environment: Binary 3.2.0 hadoop release
>Reporter: ggg
>Priority: Major
> Attachments: nm.log, yarn-site.xml
>
>
> NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
> attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread ggg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ggg resolved YARN-9526.
---
Resolution: Not A Problem

> NM invariably dies if log aggregation is enabled
> 
>
> Key: YARN-9526
> URL: https://issues.apache.org/jira/browse/YARN-9526
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
> Environment: Binary 3.2.0 hadoop release
>Reporter: ggg
>Priority: Major
> Attachments: nm.log, yarn-site.xml
>
>
> NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
> attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread ggg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831845#comment-16831845
 ] 

ggg commented on YARN-9526:
---

[~Prabhu Joseph], you're spot on in that adding these two properties fixed the 
issue:

{noformat}
 
  yarn.log-aggregation.file-formats
  TFile


  yarn.log-aggregation.file-controller.TFile.class
  
org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController

{noformat}

Digging deeper, it turns out I've mistakenly linked  (much) older 
yarn-default.xml to etc/hadoop and that file did not have those properties 
defined.
My bug.  Apologies for raising this issue, and many thanks for helping me out 
here. 

> NM invariably dies if log aggregation is enabled
> 
>
> Key: YARN-9526
> URL: https://issues.apache.org/jira/browse/YARN-9526
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
> Environment: Binary 3.2.0 hadoop release
>Reporter: ggg
>Priority: Major
> Attachments: nm.log, yarn-site.xml
>
>
> NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
> attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9525) TFile format is not working against s3a remote folder

2019-05-02 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831783#comment-16831783
 ] 

Adam Antal commented on YARN-9525:
--

The related code:
{code:java}
Path aggregatedLogFile = null;
if (context.isLogAggregationInRolling()) {
  aggregatedLogFile = initializeWriterInRolling(
    remoteLogFile, appId, nodeId);
} else {
  aggregatedLogFile = remoteLogFile;
  fsDataOStream = fc.create(remoteLogFile,
EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE),
new Options.CreateOpts[] {});
  if (uuid == null) {
uuid = createUUID(appId);
  }
  fsDataOStream.write(uuid);
  fsDataOStream.flush();
}
long aggregatedLogFileLength = fc.getFileStatus(
   aggregatedLogFile).getLen();
// append a simple character("\n") to move the writer cursor, so
// we could get the correct position when we call
// fsOutputStream.getStartPos()
final byte[] dummyBytes = "\n".getBytes(Charset.forName("UTF-8"));
fsDataOStream.write(dummyBytes);
fsDataOStream.flush();
if (fsDataOStream.getPos() >= (aggregatedLogFileLength
+ dummyBytes.length)) {
  currentOffSet = 0;
} else {
  currentOffSet = aggregatedLogFileLength;
}
{code}
 As far as I see the getFileStatus can be omitted - it also used to position 
the cursor.

> TFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a 

[jira] [Commented] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831779#comment-16831779
 ] 

Prabhu Joseph commented on YARN-9526:
-

[~ppp] This happens if either yarn-default.xml or yarn-site.xml does not have 
{{yarn.log-aggregation.file-formats}} defined. By default yarn-default.xml part 
of hadoop-yarn-api-3.2.0.jar has this config defined. Looks there is a 
yarn-default.xml in NM classpath without this config. Can you try adding below 
in yarn-site.xml.

{code}

yarn.log-aggregation.file-formats
TFile

{code}

We can log a meaningful message instead of java.util.NoSuchElementException. 



> NM invariably dies if log aggregation is enabled
> 
>
> Key: YARN-9526
> URL: https://issues.apache.org/jira/browse/YARN-9526
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
> Environment: Binary 3.2.0 hadoop release
>Reporter: ggg
>Priority: Major
> Attachments: nm.log, yarn-site.xml
>
>
> NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
> attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures

2019-05-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831780#comment-16831780
 ] 

Hadoop QA commented on YARN-9524:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
4s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
51s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 
54s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9524 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967653/YARN-9524-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 07963a318d63 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6a42745 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24043/testReport/ |
| Max. process+thread count | 692 (vs. ulimit of 1) |
| modules | C: 

[jira] [Updated] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread ggg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ggg updated YARN-9526:
--
Attachment: yarn-site.xml

> NM invariably dies if log aggregation is enabled
> 
>
> Key: YARN-9526
> URL: https://issues.apache.org/jira/browse/YARN-9526
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
> Environment: Binary 3.2.0 hadoop release
>Reporter: ggg
>Priority: Major
> Attachments: nm.log, yarn-site.xml
>
>
> NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
> attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread ggg (JIRA)
ggg created YARN-9526:
-

 Summary: NM invariably dies if log aggregation is enabled
 Key: YARN-9526
 URL: https://issues.apache.org/jira/browse/YARN-9526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 3.2.0
 Environment: Binary 3.2.0 hadoop release
Reporter: ggg
 Attachments: nm.log

NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow

2019-05-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831758#comment-16831758
 ] 

Hadoop QA commented on YARN-9508:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 135 unchanged - 0 fixed = 137 total (was 135) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m  3s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}130m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerOvercommit |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9508 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967649/YARN-9508-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 01861472abb4 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4605db3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24042/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Assigned] (YARN-9525) TFile format is not working against s3a remote folder

2019-05-02 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal reassigned YARN-9525:


Assignee: Adam Antal

> TFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log 
> file (the bytes we just wrote out), and that's where the failures happens: 
> the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9525) TFile format is not working against s3a remote folder

2019-05-02 Thread Adam Antal (JIRA)
Adam Antal created YARN-9525:


 Summary: TFile format is not working against s3a remote folder
 Key: YARN-9525
 URL: https://issues.apache.org/jira/browse/YARN-9525
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 3.1.2
Reporter: Adam Antal


Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} configured 
to an s3a URI throws the following exception during log aggregation:

{noformat}
Cannot create writer for app application_1556199768861_0001. Skip log upload 
this time. 
java.io.IOException: java.io.FileNotFoundException: No such file or directory: 
s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: 
s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
at 
org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
... 7 more
{noformat}

This stack trace point to 
{{LogAggregationIndexedFileController$initializeWriter}} where we do the 
following steps (in a non-rolling log aggregation setup):
- create FSDataOutputStream
- writing out a UUID
- flushing
- immediately after that we call a GetFileStatus to get the length of the log 
file (the bytes we just wrote out), and that's where the failures happens: the 
file is not there yet due to eventual consistency.

Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures

2019-05-02 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9524:

Attachment: YARN-9524-002.patch

> TestAHSWebServices and TestLogsCLI test case failures
> -
>
> Key: YARN-9524
> URL: https://issues.apache.org/jira/browse/YARN-9524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: Regression
> Attachments: YARN-9524-001.patch, YARN-9524-002.patch
>
>
> {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929.
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014
> [ERROR] Errors: 
> [ERROR]   
> TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543
>  » NullPointer
> [INFO] 
> [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1
> [ERROR] Failures: 
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR] Errors: 
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities

2019-05-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831669#comment-16831669
 ] 

Tao Yang commented on YARN-9440:


Thanks [~cheersyang] for the review. Remaining check-style warnings are all 
about ParameterNumber(More than 7 parameters) which should be required. UT 
failures seems unrelated to this patch and hard to reproduce them in my local 
environment.

> Improve diagnostics for scheduler and app activities
> 
>
> Key: YARN-9440
> URL: https://issues.apache.org/jira/browse/YARN-9440
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9440.001.patch, YARN-9440.002.patch, 
> YARN-9440.003.patch
>
>
> [Design doc 
> #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9477) Implement VE discovery using libudev

2019-05-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831668#comment-16831668
 ] 

Hadoop QA commented on YARN-9477:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
12s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 46s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
|  |  Hard coded reference to an absolute pathname in 

[jira] [Updated] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow

2019-05-02 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9508:

Attachment: YARN-9508-001.patch

> YarnConfiguration areNodeLabel enabled is costly in allocation flow
> ---
>
> Key: YARN-9508
> URL: https://issues.apache.org/jira/browse/YARN-9508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9508-001.patch
>
>
> For every allocate request locking can be avoided. Improving performance
> {noformat}
> "pool-6-thread-300" #624 prio=5 os_prio=0 tid=0x7f2f91152800 nid=0x8ec5 
> waiting for monitor entry [0x7f1ec6a8d000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
>  at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1674)
>  at 
> org.apache.hadoop.yarn.conf.YarnConfiguration.areNodeLabelsEnabled(YarnConfiguration.java:3646)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:274)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:242)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:427)
>  - locked <0x7f24dd3f9e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212)
>  at 
> org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures

2019-05-02 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831651#comment-16831651
 ] 

Prabhu Joseph commented on YARN-9524:
-

[~eyang] Can you review this Jira when you get time. Thanks.

> TestAHSWebServices and TestLogsCLI test case failures
> -
>
> Key: YARN-9524
> URL: https://issues.apache.org/jira/browse/YARN-9524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: Regression
> Attachments: YARN-9524-001.patch
>
>
> {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929.
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014
> [ERROR] Errors: 
> [ERROR]   
> TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543
>  » NullPointer
> [INFO] 
> [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1
> [ERROR] Failures: 
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR] Errors: 
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9477) Implement VE discovery using libudev

2019-05-02 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831631#comment-16831631
 ] 

Peter Bacsko commented on YARN-9477:


[~snemeth] could you please check patch v1?

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-001.patch, YARN-9477-POC.patch, 
> YARN-9477-POC2.patch, YARN-9477-POC3.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9477) Implement VE discovery using libudev

2019-05-02 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831605#comment-16831605
 ] 

Peter Bacsko edited comment on YARN-9477 at 5/2/19 1:09 PM:


I uploaded the first version of this.

I immediately realized two things:
 # {{filter(p -> p.getFileName().startsWith("veslot"))}} - this doesn't work 
properly, needs to be changed
 # {{assertTrue("Device should not be healthy", device.isHealthy())}} - 
incorrect assertion message in line 225


was (Author: pbacsko):
I uploaded the first version of this.

I immediately realized two things:
 # {{filter(p -> p.getFileName().startsWith("veslot")) - }}this doesn't work 
properly, needs to be changed
 # {{assertTrue("Device should not be healthy", device.isHealthy())}} - 
incorrect assertion message in line 225

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-001.patch, YARN-9477-POC.patch, 
> YARN-9477-POC2.patch, YARN-9477-POC3.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9477) Implement VE discovery using libudev

2019-05-02 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831605#comment-16831605
 ] 

Peter Bacsko commented on YARN-9477:


I uploaded the first version of this.

I immediately realized two things:
 # {{filter(p -> p.getFileName().startsWith("veslot")) - }}this doesn't work 
properly, needs to be changed
 # {{assertTrue("Device should not be healthy", device.isHealthy())}} - 
incorrect assertion message in line 225

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-001.patch, YARN-9477-POC.patch, 
> YARN-9477-POC2.patch, YARN-9477-POC3.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9477) Implement VE discovery using libudev

2019-05-02 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9477:
---
Attachment: YARN-9477-001.patch

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-001.patch, YARN-9477-POC.patch, 
> YARN-9477-POC2.patch, YARN-9477-POC3.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures

2019-05-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831595#comment-16831595
 ] 

Hadoop QA commented on YARN-9524:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 156 unchanged - 0 fixed = 157 total (was 156) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
50s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m  
8s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9524 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967621/YARN-9524-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a77445fded09 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e2f0f72 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Updated] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures

2019-05-02 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9524:

Attachment: YARN-9524-001.patch

> TestAHSWebServices and TestLogsCLI test case failures
> -
>
> Key: YARN-9524
> URL: https://issues.apache.org/jira/browse/YARN-9524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: Regression
> Attachments: YARN-9524-001.patch
>
>
> {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929.
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014
> [ERROR] Errors: 
> [ERROR]   
> TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543
>  » NullPointer
> [INFO] 
> [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1
> [ERROR] Failures: 
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR] Errors: 
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures

2019-05-02 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831542#comment-16831542
 ] 

Prabhu Joseph commented on YARN-9524:
-

YARN-6929 has caused two issues:

1. Test Case Failure of {{TestLogsCLI}} - this is due to wrong paths in 
testcase, have changed paths from older to new app log dir structure.

2. Test Case Failure of {{TestAHSWebServices}}. This is a regression which 
affects ATS (both 1.5 and 2) {{LogWebService}} and {{AHSWebServices}} rest api 
while fetching the logs from older app log dir.
{code:java}
[yarn@yarn-ats-1 ~]$ curl -s 
"http://yarn-ats-1:8198/ws/v2/applicationlog/containers/container_1556376250086_0006_01_01/logs;
{"exception":"WebApplicationException","message":"java.io.IOException: Can not 
create a Path from a null string\nCan not find remote application directory for 
the 
application:application_1556376250086_0006\n","javaClassName":"javax.ws.rs.WebApplicationException"}[yarn@yarn-ats-1
 ~]$ 
{code}

When the Rest Api does not pass the appOwner, the {{LogAggregationUtils}} has 
to guess the job user by setting * in Path which have missed in YARN-6929 for 
Older app log dir structure. Have fixed the same.

{code:java}
[yarn@yarn-ats-3 ~]$  curl -s 
"http://yarn-ats-1:8198/ws/v2/applicationlog/containers/container_1556376250086_0006_01_01/logs;
[{"containerLogInfo":[{"fileName":"AppMaster.stderr","fileSize":"4158","lastModifiedTime":"Thu
 May 02 09:54:58 + 
2019"},{"fileName":"AppMaster.stdout","fileSize":"0","lastModifiedTime":"Thu 
May 02 09:54:58 + 
2019"},{"fileName":"directory.info","fileSize":"2125","lastModifiedTime":"Thu 
May 02 09:54:58 + 
2019"},{"fileName":"launch_container.sh","fileSize":"4767","lastModifiedTime":"Thu
 May 02 09:54:58 + 
2019"},{"fileName":"prelaunch.err","fileSize":"0","lastModifiedTime":"Thu May 
02 09:54:58 + 
2019"},{"fileName":"prelaunch.out","fileSize":"100","lastModifiedTime":"Thu May 
02 09:54:58 + 
2019"}],"logAggregationType":"AGGREGATED","containerId":"container_1556376250086_0006_01_01","nodeId":"yarn-ats-2_45454"}]
{code}

Verified all testcases related to LogAggregation and functionality - Log 
Aggregation, Deletion, Yarn Logs Cli, HistoryServer, ATS Logs Web Service.

> TestAHSWebServices and TestLogsCLI test case failures
> -
>
> Key: YARN-9524
> URL: https://issues.apache.org/jira/browse/YARN-9524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929.
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014
> [ERROR] Errors: 
> [ERROR]   
> TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543
>  » NullPointer
> [INFO] 
> [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1
> [ERROR] Failures: 
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR] Errors: 
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9524) TestAHSWebServices and TestLogsCLI test case failures

2019-05-02 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9524:

Labels: Regression  (was: )

> TestAHSWebServices and TestLogsCLI test case failures
> -
>
> Key: YARN-9524
> URL: https://issues.apache.org/jira/browse/YARN-9524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: Regression
>
> {{TestAHSWebServices}} and {{TestLogsCLI}} test case failures after YARN-6929.
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestLogsCLI.testFetchApplictionLogsAsAnotherUser:1014
> [ERROR] Errors: 
> [ERROR]   
> TestLogsCLI.testFetchFinishedApplictionLogs:420->uploadEmptyContainerLogIntoRemoteDir:1543
>  » NullPointer
> [INFO] 
> [ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 1
> [ERROR] Failures: 
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR]   TestAHSWebServices.testContainerLogsForFinishedApps:624
> [ERROR] Errors: 
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...
> [ERROR]   TestAHSWebServices.testContainerLogsMetaForFinishedApps:942 » 
> WebApplication j...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org