[jira] [Comment Edited] (YARN-9923) Detect missing Docker binary or not running Docker daemon

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970862#comment-16970862
 ] 

Szilard Nemeth edited comment on YARN-9923 at 11/9/19 4:42 PM:
---

Hi [~adam.antal]!

Really like this rework of the NM health checks and the way you did implement 
the common interface for all healthchecks. This was a refactor could have been 
done long time ago. Kudos for it :) 

Some comments: 
1. As the change is bigger, this patch is not only deals with Docker health 
checks but has further refactors. 
Please include your design decisions in the jira description: What is the 
purpose of the common interface, what refactors did you done, how the 
implementors of the interface behave, etc.
Without these written in the description, it was pretty difficult to review the 
patch.

2. Nit: In NodeHealthScriptRunner: I know you moved this part of the code, but 
let's fix this javadoc: 
  {code}
  /** Time after which the script should be timedout. */
  private long scriptTimeout;
  {code}

  "timedout" should be "timed out".

3. Nit: In NodeHealthScriptRunner.newInstance: I think you could modify this 
log statement a bit:
  {code}
 LOG.info("Node Manager health check script is not available "
  + "or doesn't have execute permission, so not "
  + "starting the node health script runner.");
  {code}
  I think you could change the string "node health script runner" to the class 
name instead: NodeHealthScriptRunner. Maybe this way, the log message is more 
explicit and to the point.

4. In NodeHealthScriptRunner.reportHealthStatus: I think this info log should 
be warn instead:
 {code}
   default:
LOG.info("Unknown HealthCheckerExitStatus - ignored.");
break;
 {code}

In the contstuctor of NodeHealthScriptRunner, you have this statement: 
{code}
super(NodeHealthScriptRunner.class.getName(), chkInterval);
{code}
Typo in name "chkInterval".

5. Nit: Can you move the constructor of NodeHealthScriptRunner above or below 
the newInstane method? It was pretty hard to find it down there.

6. Nit: Javadoc of NodeHealthScriptRunner#shouldRun could be improved: 
{code}
 Method used to determine if or not node health monitoring service should be
   * started or not. Returns true if following conditions are met:
{code}
I would modify the first line as: "Method used to determine whether the health 
monitoring service should be started or not".

7. If NodeHealthScriptRunner#shouldRun returns false, can you log the same 
message but on warn or error level? I think this is an error case, as the 
script is null or empty. Isn't it? 
Maybe you could also log the script's file name on debug level, but maybe not 
in this method.

8. Can you use an enum to mark successful / unsuccessful cases 
NodeHealthMonitorExecutor#reportHealthStatus calls setHealthStatus? I don't 
think the boolean flag is the cleanest approach. Moreover, you have the log 
statement: 
{code}
LOG.info("health status being set as " + output);
{code}, that looks weird with a true/false value.

9. Nit: In NodeHealthMonitorExecutor#reportHealthStatus: The branches SUCCESS 
and FAILED_WITH_EXIT_CODE can be merged together as they are running the same 
code.

10. Nit: Javadoc of NodeHealthMonitorExecutor#hasErrors looks weird for 
parameter 'output'

11. In the constructor of NodeHealthScriptRunner: I would simply do: 
{code}
this.task = new NodeHealthMonitorExecutor(scriptArgs);
{code}, making the code more readable and making setTimerTask only used by 
tests.

12. Nit: Access on method TimedHealthReporterService#setLastReportedTime can be 
private.

13. Is it intentional that TimedHealthReporterService#setHealthStatus(boolean, 
java.lang.String) is not calling this.setLastReportedTime(); with the current 
time? If it is, why?

14. Nit: In the javadoc of class NodeHealthCheckerService: "The class..." 
should be "This class".

15. In the javadoc of class NodeHealthCheckerService: Can you add some basic 
examples on how to use this class, how reporters should be added, etc?

16. In NodeHealthCheckerService#addHealthReporter, if the 1st if-condition 
(noneMatch) fails, can you log something? AFAIU, if this condition is false, 
someone tried to add a duplicate service, so this is more likely a programming 
error.

17. Nit: Could you rephrase the javadoc of 
NodeHealthCheckerService#getHealthReport?

18. Question about NodeHealthCheckerService#isHealthy and its usage of 
allMatch: What if all reporters are returning false as the return value of 
isHealthy? Would allMatch return true for this case? Can you write a unit test 
for this case?

19. Nit: 
DockerHealthCheckerService.DockerDaemonMonitorExecutor#getPossiblePidFileLocations:
 Can you store "docker.pid" as a constant?

20. Comments for DockerDaemonMonitorExecutor#run: 
- Please log if none of the pid file variants are found - in other words, when 
you need 

[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970862#comment-16970862
 ] 

Szilard Nemeth commented on YARN-9923:
--

Hi [~adam.antal]!

Really like this rework of the NM health checks and the way you did implement 
the common interface for all healthchecks. This was a refactor could have been 
done long time ago. Kudos for it :) 

Some comments: 
1. As the change is bigger, this patch is not only deals with Docker health 
checks but has further refactors. 
Please include your design decisions in the jira description: What is the 
purpose of the common interface, what refactors did you done, how the 
implementors of the interface behave, etc.
Without these written in the description, it was pretty difficult to review the 
patch.

2. In NodeHealthScriptRunner: I know you moved this part of the code, but let's 
fix this javadoc: 
  {code}
  /** Time after which the script should be timedout. */
  private long scriptTimeout;
  {code}

  "timedout" should be "timed out".

3. In NodeHealthScriptRunner.newInstance: I think you could modify this log 
statement a bit:
  {code}
 LOG.info("Node Manager health check script is not available "
  + "or doesn't have execute permission, so not "
  + "starting the node health script runner.");
  {code}
  I think you could change the string "node health script runner" to the class 
name instead: NodeHealthScriptRunner. Maybe this way, the log message is more 
explicit and to the point.

4. In NodeHealthScriptRunner.reportHealthStatus: I think this info log should 
be warn instead:
 {code}
   default:
LOG.info("Unknown HealthCheckerExitStatus - ignored.");
break;
 {code}

In the contstuctor of NodeHealthScriptRunner, you have this statement: 
{code}
super(NodeHealthScriptRunner.class.getName(), chkInterval);
{code}
Typo in name "chkInterval".

5. Can you move the constructor of NodeHealthScriptRunner above or below the 
newInstane method? It was pretty hard to find it down there.

6. Javadoc of NodeHealthScriptRunner#shouldRun could be improved: 
{code}
 Method used to determine if or not node health monitoring service should be
   * started or not. Returns true if following conditions are met:
{code}
I would modify the first line as: "Method used to determine whether the health 
monitoring service should be started or not".

7. If NodeHealthScriptRunner#shouldRun returns false, can you log the same 
message but on warn or error level? I think this is an error case, as the 
script is null or empty. Isn't it? 
Maybe you could also log the script's file name on debug level, but maybe not 
in this method.

8. Can you use an enum to mark successful / unsuccessful cases 
NodeHealthMonitorExecutor#reportHealthStatus calls setHealthStatus? I don't 
think the boolean flag is the cleanest approach. Moreover, you have the log 
statement: 
{code}
LOG.info("health status being set as " + output);
{code}, that looks weird with a true/false value.

9. In NodeHealthMonitorExecutor#reportHealthStatus: The branches SUCCESS and 
FAILED_WITH_EXIT_CODE can be merged together as they are running the same code.

10. Javadoc of NodeHealthMonitorExecutor#hasErrors looks weird for parameter 
'output'

11. In the constructor of NodeHealthScriptRunner: I would simply do: 
{code}
this.task = new NodeHealthMonitorExecutor(scriptArgs);
{code}, making the code more readable and making setTimerTask only used by 
tests.

12. Nit: Access on method TimedHealthReporterService#setLastReportedTime can be 
private.

13. Is it intentional that TimedHealthReporterService#setHealthStatus(boolean, 
java.lang.String) is not calling this.setLastReportedTime(); with the current 
time? If it is, why?

14. Nit: In the javadoc of class NodeHealthCheckerService: "The class..." 
should be "This class".

15. In the javadoc of class NodeHealthCheckerService: Can you add some basic 
examples on how to use this class, how reporters should be added, etc?

16. In NodeHealthCheckerService#addHealthReporter, if the 1st if-condition 
(nonematch) fails, can you log something? AFAIU, if this condition is false, 
someone tried to add a duplicate service, so this is more likely a programming 
error.

17. Nit: Could you rephrase the javadoc of 
NodeHealthCheckerService#getHealthReport?

18. Question about NodeHealthCheckerService#isHealthy and its usage of 
allMatch: What if all reporters are returning false as the return value of 
isHealthy? Would allMatch return true for this case? Can you write a unit test 
for this case?

19. Nit: 
DockerHealthCheckerService.DockerDaemonMonitorExecutor#getPossiblePidFileLocations:
 Can you store "docker.pid" as a constant?

20. Comments for DockerDaemonMonitorExecutor#run: 
- Please log if none of the pid file variants are found - in other words, when 
you need to fallback to read the file from the daemon.json, you should log this 
case.
- 

[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970827#comment-16970827
 ] 

Szilard Nemeth commented on YARN-9923:
--

Hi [~adam.antal]!
Can you please check the checkstyle / findbugs and unit test failures as a 
first step? 

Thanks!

> Detect missing Docker binary or not running Docker daemon
> -
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9923.001.patch, YARN-9923.002.patch
>
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9677) Make FpgaDevice and GpuDevice classes more similar to each other

2019-11-09 Thread kevin su (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970823#comment-16970823
 ] 

kevin su commented on YARN-9677:


Thanks [~snemeth] for the commit

> Make FpgaDevice and GpuDevice classes more similar to each other
> 
>
> Key: YARN-9677
> URL: https://issues.apache.org/jira/browse/YARN-9677
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-9677.001.patch
>
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceAllocator.FpgaDevice
>  is an inner class of FpgaResourceAllocator.
> It is not only being used from its parent class but from other classes as 
> well so we are losing the purpose of the inner class, it does not really make 
> sense.
> We also have 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice
>  which is a similar class, but for GPU devices.
> What we could do here is to make FpgaDevice a single class and harmonize the 
> packages of these 2 classes, meaning they should be "closer" to each other in 
> terms of packaging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8148) Update decimal values for queue capacities shown on queue status CLI

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970822#comment-16970822
 ] 

Szilard Nemeth commented on YARN-8148:
--

Hi [~sunilg] / [~leftnoteasy]! Any update on this? 
Thanks!

> Update decimal values for queue capacities shown on queue status CLI
> 
>
> Key: YARN-8148
> URL: https://issues.apache.org/jira/browse/YARN-8148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-8148-002.patch, YARN-8148.1.patch
>
>
> Capacities are shown with two decimal values in RM UI as part of YARN-6182. 
> The queue status cli are still showing one decimal value.
> {code}
> [root@bigdata3 yarn]# yarn queue -status default
> Queue Information : 
> Queue Name : default
>   State : RUNNING
>   Capacity : 69.9%
>   Current Capacity : .0%
>   Maximum Capacity : 70.0%
>   Default Node Label expression : 
>   Accessible Node Labels : *
>   Preemption : enabled
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3890) FairScheduler should show the scheduler health metrics similar to ones added in CapacityScheduler

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970819#comment-16970819
 ] 

Szilard Nemeth commented on YARN-3890:
--

Hi [~zsiegl]! 
Can I take this over?

Thanks!

> FairScheduler should show the scheduler health metrics similar to ones added 
> in CapacityScheduler
> -
>
> Key: YARN-3890
> URL: https://issues.apache.org/jira/browse/YARN-3890
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-3890.001.patch, YARN-3890.002.patch, 
> YARN-3890.003.patch
>
>
> We should add information displayed in YARN-3293 in FairScheduler as well 
> possibly sharing the implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9128) Use SerializationUtils from apache commons to serialize / deserialize ResourceMappings

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970820#comment-16970820
 ] 

Szilard Nemeth commented on YARN-9128:
--

Hi [~zsiegl]! 
Can I take this over?

Thanks!

> Use SerializationUtils from apache commons to serialize / deserialize 
> ResourceMappings
> --
>
> Key: YARN-9128
> URL: https://issues.apache.org/jira/browse/YARN-9128
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Zoltan Siegl
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9128.001.patch, YARN-9128.002.patch, 
> YARN-9128.003.patch, YARN-9128.branch-3.2.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970818#comment-16970818
 ] 

Szilard Nemeth commented on YARN-5106:
--

Hi [~zsiegl]! 
Can I take this over?

Thanks!

> Provide a builder interface for FairScheduler allocations for use in tests
> --
>
> Key: YARN-5106
> URL: https://issues.apache.org/jira/browse/YARN-5106
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Zoltan Siegl
>Priority: Major
>  Labels: newbie++
> Attachments: YARN-5106-branch-3.1.001.patch, 
> YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, 
> YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, 
> YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, 
> YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, 
> YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, 
> YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, 
> YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, 
> YARN-5106.011.patch, YARN-5106.012.patch
>
>
> Most, if not all, fair scheduler tests create an allocations XML file. Having 
> a helper class that potentially uses a builder would make the tests cleaner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970816#comment-16970816
 ] 

Szilard Nemeth commented on YARN-9537:
--

Hi [~cane]!

Just a minor comment: In FSAppAttempt: Maybe don't need to initialize the 
boolean field as: 

{code:java}
 private boolean enableAMPreemption =
  FairSchedulerConfiguration.DEFAULT_AM_PREEMPTION;
{code}

Since in the constructor, you have this code block: 

{code:java}
 if (scheduler.getConf() != null) {
  this.enableAMPreemption = scheduler.getConf()
  .getAMPreemptionEnabled(getQueue().getQueueName());
}
{code}

[~cane], [~yufeigu]: Can you guys think of any case when scheduler.getConf() is 
null? 
If not, then we can remove the null check and the default value for the 
enableAMPreemption boolean field and rely on the 
value coming from 
{code:java}
scheduler.getConf()
  .getAMPreemptionEnabled(getQueue().getQueueName());
{code}, completely.

Thanks!


> Add configuration to disable AM preemption
> --
>
> Key: YARN-9537
> URL: https://issues.apache.org/jira/browse/YARN-9537
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0, 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9537-002.patch, YARN-9537.001.patch, 
> YARN-9537.003.patch, YARN-9537.004.patch
>
>
> In this issue, i will add a configuration to support disable AM preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9886) Queue mapping based on userid passed through application tag

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970813#comment-16970813
 ] 

Szilard Nemeth edited comment on YARN-9886 at 11/9/19 2:16 PM:
---

Hi [~kmarton]!

1. I was wondering what happens if 
{{RMAppManager#getUserNameFromApplicationTag}} receives a list of application 
tags, like: {"u=someuser", "u="}, so the second user is missing. So I written a 
testcase:


{code:java}
@Test
  public void testWronglyQualifiedUserNameInTag()
  throws YarnException {
String user = "user1";
String expectedQueue = "user1Queue";
String userIdTag = "u=user2";
String wrongUserIdTag = "u=";
setApplicationTags("tag1", wrongUserIdTag, userIdTag, "tag2");
enableApplicationTagPlacement(true, user);
verifyPlacementUsername(expectedQueue, user, user);
  }
{code}


Just put into TestAppManager and run it. An ArrayIndexOutOfBoundsException can 
come so easily.
Please fix this one!
For reference, here's the stacktrace of the test run: 

{code:java}
java.lang.ArrayIndexOutOfBoundsException: 1

at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.getUserNameFromApplicationTag(RMAppManager.java:985)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.getUserNameForPlacement(RMAppManager.java:943)
at 
org.apache.hadoop.yarn.server.resourcemanager.AppManagerTestBase$TestRMAppManager.getUserNameForPlacement(AppManagerTestBase.java:111)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager.verifyPlacementUsername(TestAppManager.java:1322)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager.testWronglyQualifiedUserNameInTag(TestAppManager.java:1290)
{code}


2. Nit: In {{RMAppManager#getUserNameForPlacement}}: 
The following declaration + assignment could be one statement: 


{code:java}
UserGroupInformation callerUGI;
  callerUGI = UserGroupInformation.createRemoteUser(userNameFromAppTag);
{code}


3. {{TestAppManager#testGetUserNameForPlacementTagBasedPlacementWrongUserId}}: 
I think this method name is a little bit misleading. The user id is not wrong, 
it's just not whitelisted.
Btw, what's the difference between this testcase and 
{{testGetUserNameForPlacementNotWhitelistedUser}}?

Thanks!


was (Author: snemeth):
Hi [~kmarton]!

1. I was wondering what happens if 
{{RMAppManager#getUserNameFromApplicationTag}} receives a list of application 
tags, like: {"u=someuser", "u="}, so the second user is missing. So I written a 
testcase:


{code:java}
@Test
  public void testWronglyQualifiedUserNameInTag()
  throws YarnException {
String user = "user1";
String expectedQueue = "user1Queue";
String userIdTag = "u=user2";
String wrongUserIdTag = "u=";
setApplicationTags("tag1", wrongUserIdTag, userIdTag, "tag2");
enableApplicationTagPlacement(true, user);
verifyPlacementUsername(expectedQueue, user, user);
  }
{code}


Just put into TestAppManager and run it. An ArrayIndexOutOfBoundsException can 
come so easily.
Please fix this one!
For reference, here's the stacktrace of the test run: 

{code:java}
java.lang.ArrayIndexOutOfBoundsException: 1

at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.getUserNameFromApplicationTag(RMAppManager.java:985)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.getUserNameForPlacement(RMAppManager.java:943)
at 
org.apache.hadoop.yarn.server.resourcemanager.AppManagerTestBase$TestRMAppManager.getUserNameForPlacement(AppManagerTestBase.java:111)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager.verifyPlacementUsername(TestAppManager.java:1322)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager.testWronglyQualifiedUserNameInTag(TestAppManager.java:1290)
{code}


2. Nit: In {{RMAppManager#getUserNameForPlacement}}: 
The following declaration + assignment could be one statement: 


{code:java}
UserGroupInformation callerUGI;
  callerUGI = UserGroupInformation.createRemoteUser(userNameFromAppTag);
{code}


3. {{TestAppManager#testGetUserNameForPlacementTagBasedPlacementWrongUserId}}: 
I think this method name is a little bit misleading. The user id is not wrong, 
it's just not whitelisted.
Btw, what's the difference between this testcase and 
testGetUserNameForPlacementNotWhitelistedUser?

Thanks!

> Queue mapping based on userid passed through application tag
> 
>
> Key: YARN-9886
> URL: https://issues.apache.org/jira/browse/YARN-9886
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Attachments: YARN-9886-WIP.patch, YARN-9886.001.patch, 
> YARN-9886.002.patch, 

[jira] [Commented] (YARN-9886) Queue mapping based on userid passed through application tag

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970813#comment-16970813
 ] 

Szilard Nemeth commented on YARN-9886:
--

Hi [~kmarton]!

1. I was wondering what happens if 
{{RMAppManager#getUserNameFromApplicationTag}} receives a list of application 
tags, like: {"u=someuser", "u="}, so the second user is missing. So I written a 
testcase:


{code:java}
@Test
  public void testWronglyQualifiedUserNameInTag()
  throws YarnException {
String user = "user1";
String expectedQueue = "user1Queue";
String userIdTag = "u=user2";
String wrongUserIdTag = "u=";
setApplicationTags("tag1", wrongUserIdTag, userIdTag, "tag2");
enableApplicationTagPlacement(true, user);
verifyPlacementUsername(expectedQueue, user, user);
  }
{code}


Just put into TestAppManager and run it. An ArrayIndexOutOfBoundsException can 
come so easily.
Please fix this one!
For reference, here's the stacktrace of the test run: 

{code:java}
java.lang.ArrayIndexOutOfBoundsException: 1

at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.getUserNameFromApplicationTag(RMAppManager.java:985)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.getUserNameForPlacement(RMAppManager.java:943)
at 
org.apache.hadoop.yarn.server.resourcemanager.AppManagerTestBase$TestRMAppManager.getUserNameForPlacement(AppManagerTestBase.java:111)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager.verifyPlacementUsername(TestAppManager.java:1322)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager.testWronglyQualifiedUserNameInTag(TestAppManager.java:1290)
{code}


2. Nit: In {{RMAppManager#getUserNameForPlacement}}: 
The following declaration + assignment could be one statement: 


{code:java}
UserGroupInformation callerUGI;
  callerUGI = UserGroupInformation.createRemoteUser(userNameFromAppTag);
{code}


3. {{TestAppManager#testGetUserNameForPlacementTagBasedPlacementWrongUserId}}: 
I think this method name is a little bit misleading. The user id is not wrong, 
it's just not whitelisted.
Btw, what's the difference between this testcase and 
testGetUserNameForPlacementNotWhitelistedUser?

Thanks!

> Queue mapping based on userid passed through application tag
> 
>
> Key: YARN-9886
> URL: https://issues.apache.org/jira/browse/YARN-9886
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Attachments: YARN-9886-WIP.patch, YARN-9886.001.patch, 
> YARN-9886.002.patch, YARN-9886.003.patch
>
>
> There are situations when the real submitting user differs from the user what 
> arrives to YARN. For example in case of a Hive application when Hive 
> impersonation is turned off, the hive queries will run as Hive user and the 
> mapping is done based on this username. Unfortunately in this case YARN 
> doesn't have any information about the real user and there are cases when the 
> customer may want to map these applications to the real submitting user's 
> queue instead of the Hive queue.
> For these cases, if they would pass the username in the application tag we 
> may read it and use it during the queue mapping, if that user has rights to 
> run on the real user's queue.  
> [~sunilg] please correct me if I missed something.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9890) [UI2] Add Application tag to the app table and app detail page.

2019-11-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970797#comment-16970797
 ] 

Hudson commented on YARN-9890:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17620 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17620/])
YARN-9890. [UI2] Add Application tag to the app table and app detail (snemeth: 
rev ceb9c6175e9a0e4479a67e247e8d90eefc839fda)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-app.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/app-table-columns.js


> [UI2] Add Application tag to the app table and app detail page.
> ---
>
> Key: YARN-9890
> URL: https://issues.apache.org/jira/browse/YARN-9890
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: UI2_ApplicationTag.png, YARN-9890.001.patch
>
>
> Right now AFAIK there is no possibility to filter the applications based on 
> the application tag in the UI. Adding this new column to the app table will 
> make this filtering possible as well.
> From the UI2 this information is missing from the application detail page as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9886) Queue mapping based on userid passed through application tag

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964063#comment-16964063
 ] 

Szilard Nemeth edited comment on YARN-9886 at 11/9/19 1:29 PM:
---

Hi [~kmarton] !

Thanks for this patch!

About your question about the documentation: I'm not aware of any generic place 
for scheduler-level info in the documentation.

First of all, can you check the checkstyle issues?

Some comments:

1. In {{yarn-default.xml}}, I would rephrase the description for 
{{"application-tag.based-placement.enable"}} a bit:
{code:java}
Whether to enable application placement based on user ID passed via
 application tags. When it is enabled, u= pattern will be checked
 and if found, the application will be placed onto the found user's queue,
 if the original user has enough rights on the passed user's queue.{code}
2. You introduced 2 new types of configs: 
{{application-tag.based-placement.enable}} and 
{{application-tag.based-placement.username.whitelist}}.
 Both of them are starting with {{"application-tag.based-placement"}}.
 For me, a more readable / intuitive name would be: 
{{application-tag-based-placement}}, so I don't think you need the dot in this 
name. The rest of the property names look good to me.

3. There's a log string in {{RMAppManager#getUserNameForPlacement}}:
{code:java}
LOG.warn("[{}] user is not allowed to do placement based " +
 "on application tag");{code}
You don't pass the username as an argument to warn, so this is missing.
 On top of that, I'd repharse the beginning of the log message as: "User '{}' 
is not allowed...".
 I'd also recheck all the log strings and would use the "User ''" format instead of "[{}] user".

4. In {{RMAppManager#getUserNameForPlacement}}:
{code:java}
LOG.warn("There is no userId passed. The placement is done fot [{}] user",
 user);{code}
There's a typo in this message: "fot" -> "for".
 Could you also modify the message, like:
{code:java}
"userId was not found in application tags."{code}
or something like this?

4. In {{RMAppManager#getUserNameFromApplicationTag}}: 
 You don't need a variable for the {{userIdTag}}. You can simply return the 
value from the loop instead of using the break statement.

5. In {{TestRmAppManager#setApplicationTags}}: 
 The for-loop could be replaced with:
{code:java}
Collections.addAll(applicationTags, tags);{code}
6. Can you rename {{TestAppManager#checkUsername}} to 
{{verifyPlacementUsername}}?

7. Nit: In {{TestAppManager}}, some test methods are starting with an 
unnecessary empty line.

Nice job with the testcases, I really liked them.

Thanks!


was (Author: snemeth):
Hi @kmarton!

Thanks for this patch!

About your question about the documentation: I'm not aware of any generic place 
for scheduler-level info in the documentation.

First of all, can you check the checkstyle issues?

Some comments:

1. In {{yarn-default.xml}}, I would rephrase the description for 
{{"application-tag.based-placement.enable"}} a bit: 
{code:java}
Whether to enable application placement based on user ID passed via
 application tags. When it is enabled, u= pattern will be checked
 and if found, the application will be placed onto the found user's queue,
 if the original user has enough rights on the passed user's queue.{code}
2. You introduced 2 new types of configs: 
{{application-tag.based-placement.enable}} and 
{{application-tag.based-placement.username.whitelist}}.
 Both of them are starting with {{"application-tag.based-placement"}}.
 For me, a more readable / intuitive name would be: 
{{application-tag-based-placement}}, so I don't think you need the dot in this 
name. The rest of the property names look good to me.

3. There's a log string in {{RMAppManager#getUserNameForPlacement}}: 
{code:java}
LOG.warn("[{}] user is not allowed to do placement based " +
 "on application tag");{code}

 You don't pass the username as an argument to warn, so this is missing.
 On top of that, I'd repharse the beginning of the log message as: "User '{}' 
is not allowed...".
 I'd also recheck all the log strings and would use the "User ''" format instead of "[{}] user".

4. In {{RMAppManager#getUserNameForPlacement}}:
{code:java}
LOG.warn("There is no userId passed. The placement is done fot [{}] user",
 user);{code}
There's a typo in this message: "fot" -> "for".
 Could you also modify the message, like:
{code:java}
"userId was not found in application tags."{code}
or something like this?

4. In {{RMAppManager#getUserNameFromApplicationTag}}: 
 You don't need a variable for the {{userIdTag}}. You can simply return the 
value from the loop instead of using the break statement.

5. In {{TestRmAppManager#setApplicationTags}}: 
 The for-loop could be replaced with:
{code:java}
Collections.addAll(applicationTags, tags);{code}
6. Can you rename {{TestAppManager#checkUsername}} to 
{{verifyPlacementUsername}}?

7. Nit: In {{TestAppManager}}, some 

[jira] [Commented] (YARN-9677) Make FpgaDevice and GpuDevice classes more similar to each other

2019-11-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970795#comment-16970795
 ] 

Hudson commented on YARN-9677:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17619 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17619/])
YARN-9677. Make FpgaDevice and GpuDevice classes more similar to each (snemeth: 
rev 31f172fd96e17c038fba2edbcaf340a323c6f7ff)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandlerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/IntelFpgaOpenclPlugin.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/TestFpgaDiscoverer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/AoclDiagnosticOutputParser.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/AbstractFpgaVendorPlugin.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/FpgaNodeResourceUpdateHandler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/DeviceSpecParser.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/AoclOutputBasedDiscoveryStrategy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/FPGADiscoveryStrategy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/FpgaResourceHandlerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/ScriptBasedFPGADiscoveryStrategy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/TestAoclOutputParser.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/SettingsBasedFPGADiscoveryStrategy.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/FpgaDevice.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/FpgaResourceAllocator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/FpgaDiscoverer.java


> Make FpgaDevice and GpuDevice classes more similar to each other
> 
>
> Key: YARN-9677
> URL: https://issues.apache.org/jira/browse/YARN-9677
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-9677.001.patch
>
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceAllocator.FpgaDevice
>  is an inner class of FpgaResourceAllocator.
> It is not only being used from its parent class but from other classes as 
> well so we are losing the purpose of the inner class, it does not really make 
> 

[jira] [Commented] (YARN-9899) Migration tool that help to generate CS config based on FS config [Phase 2]

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970794#comment-16970794
 ] 

Szilard Nemeth commented on YARN-9899:
--

Hi [~pbacsko] !

Just two minor things:

1. In javadoc of 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.converter.TestFSConfigToCSConfigConverterMain#testConvertFSConfigurationDefaults:
 The example command is outdated, as you are no longer using yarn 
resourcemanager as we have a separate main class for the converter.

2. You have a typo in method name: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.converter.TestQueuePlacementConverter#initPlacemenetManagerMock

Other than these, patch looks good!

> Migration tool that help to generate CS config based on FS config [Phase 2] 
> 
>
> Key: YARN-9899
> URL: https://issues.apache.org/jira/browse/YARN-9899
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9899-001.patch, YARN-9899-002.patch, 
> YARN-9899-003.patch, YARN-9899-004.patch, YARN-9899-005.patch
>
>
> YARN-9699 laid down the groundworks of a converter from FS to CS config.
> During the development of the converter, we came up with the following things 
> to fix. 
> 1. If we don't specify a mandatory option, we have this stacktrace for 
> example:
>  
> {code:java}
> org.apache.commons.cli.MissingOptionException: Missing required option: o
>  at org.apache.commons.cli.Parser.checkRequiredOptions(Parser.java:299)
>  at org.apache.commons.cli.Parser.parse(Parser.java:231)
>  at org.apache.commons.cli.Parser.parse(Parser.java:85)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.converter.FSConfigToCSConfigArgumentHandler.parseAndConvert(FSConfigToCSConfigArgumentHandler.java:100)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1572){code}
>  
> We should provide a more concise and meaningful error message (without 
> stacktrace on the CLI, but we should log the exception with stacktrace to the 
> RM log).
> An explanation of the missing option is also required.
> 2. We may think about how to handle exceptions from commons CLI: 
> MissingArgumentException vs. MissingOptionException
> 3. We need to provide a -h / --help option for the CLI that prints all the 
> possible options / arguments.
> 4. Last but not least: We should move the CLI command to a more reasonable 
> place:
> As YARN-9699 implemented it, the command can be invoked like: 
> {code:java}
> /opt/hadoop/bin/yarn resourcemanager -convert-fs-configuration -y 
> /opt/hadoop/etc/hadoop/yarn-site.xml -f 
> /opt/hadoop/etc/hadoop/fair-scheduler.xml -r 
> ~systest/sample-rules-config.properties -o /tmp/fs-cs-output
> {code}
> This is problematic, as if YARN RM is already running, we need to stop it in 
> order to start the RM again with the conversion switch.
> 5. Add unit test coverage for {{QueuePlacementConverter}}
> 6. Close some feature gaps.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9890) [UI2] Add Application tag to the app table and app detail page.

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970793#comment-16970793
 ] 

Szilard Nemeth commented on YARN-9890:
--

Thanks [~kmarton] for this contribution, committed to trunk!

> [UI2] Add Application tag to the app table and app detail page.
> ---
>
> Key: YARN-9890
> URL: https://issues.apache.org/jira/browse/YARN-9890
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Attachments: UI2_ApplicationTag.png, YARN-9890.001.patch
>
>
> Right now AFAIK there is no possibility to filter the applications based on 
> the application tag in the UI. Adding this new column to the app table will 
> make this filtering possible as well.
> From the UI2 this information is missing from the application detail page as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9677) Make FpgaDevice and GpuDevice classes more similar to each other

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970790#comment-16970790
 ] 

Szilard Nemeth commented on YARN-9677:
--

Thanks [~pingsutw] for this contribution and thanks [~pbacsko] for the review!

There were 2 unused imports that I removed, as additional changes to the patch:
1. In TestAoclOutputParser: 
import 
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDevice;

2. Also in TestFpgaDiscoverer: 
import 
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDevice;

I don't think we want to bother branch-3.2 and branch-3.1 patches as this patch 
has many conflicts for branch-3.2.

Committed to trunk and closing this jira.

> Make FpgaDevice and GpuDevice classes more similar to each other
> 
>
> Key: YARN-9677
> URL: https://issues.apache.org/jira/browse/YARN-9677
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-9677.001.patch
>
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceAllocator.FpgaDevice
>  is an inner class of FpgaResourceAllocator.
> It is not only being used from its parent class but from other classes as 
> well so we are losing the purpose of the inner class, it does not really make 
> sense.
> We also have 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice
>  which is a similar class, but for GPU devices.
> What we could do here is to make FpgaDevice a single class and harmonize the 
> packages of these 2 classes, meaning they should be "closer" to each other in 
> terms of packaging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set

2019-11-09 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9965:

Description: 
Loading an auxiliary jar from a Hdfs location on a node manager works as 
expected on first time. The subsequent restart fails with ClassNotFoundException
{code:java}
2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
classpath: []
2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
system classes: [java., javax.accessibility., javax.activation., 
javax.activity., javax.annotation., javax.annotation.processing., 
javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
-javax.management.j2ee., javax.management., javax.naming., javax.net., 
javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., 
javax.sql., javax.swing., javax.tools., javax.transaction., 
-javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., 
org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., 
org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, 
yarn-default.xml]
2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
in state INITED
java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
at 
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
{code}
 

The issue happens when reusing the previous localized auxillary service jar. 
The localized jar file is appended with /* when reusing which has caused the 
issue.

 

 

  was:
Loading an auxiliary jar from a Hdfs location on a node manager works as 
expected on first time. The subsequent restart fails with ClassNotFoundException
{code:java}
2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
classpath: []
2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
system classes: [java., javax.accessibility., javax.activation., 
javax.activity., javax.annotation., javax.annotation.processing., 
javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
-javax.management.j2ee., javax.management., javax.naming., javax.net., 
javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., 
javax.sql., javax.swing., javax.tools., javax.transaction., 
-javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., 
org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., 
org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, 
yarn-default.xml]
2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
in state INITED
java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at 

[jira] [Updated] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set

2019-11-09 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9965:

Description: 
Loading an auxiliary jar from a Hdfs location on a node manager works as 
expected on first time. The subsequent restart fails with ClassNotFoundException
{code:java}
2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
classpath: []
2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
system classes: [java., javax.accessibility., javax.activation., 
javax.activity., javax.annotation., javax.annotation.processing., 
javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
-javax.management.j2ee., javax.management., javax.naming., javax.net., 
javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., 
javax.sql., javax.swing., javax.tools., javax.transaction., 
-javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., 
org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., 
org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, 
yarn-default.xml]
2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
in state INITED
java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
at 
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
{code}
 

The issue happens when reusing the previous localized auxillary service jar as 
the  

 

 

  was:
Loading an auxiliary jar from a Hdfs location on a node manager works as 
expected on first time. The subsequent restart fails with ClassNotFoundException
{code:java}
2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
classpath: []
2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
system classes: [java., javax.accessibility., javax.activation., 
javax.activity., javax.annotation., javax.annotation.processing., 
javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
-javax.management.j2ee., javax.management., javax.naming., javax.net., 
javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., 
javax.sql., javax.swing., javax.tools., javax.transaction., 
-javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., 
org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., 
org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, 
yarn-default.xml]
2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
in state INITED
java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 

[jira] [Commented] (YARN-9920) YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from FairScheduler

2019-11-09 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970758#comment-16970758
 ] 

Prabhu Joseph commented on YARN-9920:
-

[~wilfreds] Thanks for reviewing the Jira.

1. {{AccessRequest#getRemoteAddress()}} is used by Ranger Authorization for 
Audit Purpose. The *Client IP* field is null as per the ranger audit screenshot

!AccessAudist_yarn_clientIPempty.png|height=200!

 

2. {{Server.getRemoteAddress()}} will return the right Client IP address from 
thread local {{CurCall}} to any method which is executed as part of the IPC 
Server Thread, else will return Null. We have used 
{{Server.getRemoteAddress()}} at below place which is not part of IPC Server 
thread and hence returned Null.

*EventDispatcher Thread -> FairScheduler#addApplication -> FSQueue.hasAccess -> 
Server.getRemoteAddress returns null*

To fix this, have stored the Client IP Address inside {{RMAppImpl}} while 
{{createAndPopulateNewRMApp}} which is called as part of IPC Server thread. 
This will be used later by FairScheduler when checking queue access.

*IPC Server -> RMAppManager#createAndPopulateNewRMApp -> AppAddedSchedulerEvent*
{code:java}
FairScheduler.java:

+  RMApp rmApp = rmContext.getRMApps().get(applicationId);
+  String remoteAddress = (rmApp != null) ?
+  rmApp.getRemoteAddress() : Server.getRemoteAddress();
+
+  if (!queue.hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi,
+  remoteAddress, null) &&
{code}
 

3. In {{RMWebServices}}, there are certain places it directly uses 
{{checkAccess(}}) where {{HttpServletRequest#getRemoteAddr()}} is passed. But 
when using ClientRMService for submit app, move app, need to figure out a way 
to get the ClientIPAddress.

> YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from 
> FairScheduler
> --
>
> Key: YARN-9920
> URL: https://issues.apache.org/jira/browse/YARN-9920
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, security
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: AccessAudist_yarn_clientIPempty.png, 
> YARN-9920-001.patch, YARN-9920-002.patch, YARN-9920-003.patch
>
>
> YarnAuthorizationProvider AccessRequest has null RemoteAddress in case of 
> FairScheduler. FSQueue#hasAccess uses Server.getRemoteAddress() which will be 
> null when the call is from RMWebServices and EventDispatcher. It works fine 
> when called by IPC Server Handler.
> FSQueue#hasAccess is called at three places where (2) and (3) returns null.
> *1. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> FSQueue#hasAccess 
> -> Server.getRemoteAddress returns correct Remote IP.*
>  
> *2. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> 
> AppAddedSchedulerEvent*
>     *EventDispatcher -> FairScheduler#addApplication -> FSQueue.hasAccess -> 
> Server.getRemoteAddress returns null*
>   
> {code:java}
> org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:509)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1268)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:133)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> {code}
>  
> *3. RMWebServices -> QueueACLsManager#checkAccess -> FSQueue.hasAccess -> 
> Server.getRemoteAddress returns null.*
> {code:java}
> org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.checkAccess(FairScheduler.java:1610)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:84)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:553)
> {code}
>  
> Have verified with CapacityScheduler and it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-9920) YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from FairScheduler

2019-11-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970754#comment-16970754
 ] 

Hadoop QA commented on YARN-9920:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} YARN-9920 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9920 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25130/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from 
> FairScheduler
> --
>
> Key: YARN-9920
> URL: https://issues.apache.org/jira/browse/YARN-9920
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, security
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: AccessAudist_yarn_clientIPempty.png, 
> YARN-9920-001.patch, YARN-9920-002.patch, YARN-9920-003.patch
>
>
> YarnAuthorizationProvider AccessRequest has null RemoteAddress in case of 
> FairScheduler. FSQueue#hasAccess uses Server.getRemoteAddress() which will be 
> null when the call is from RMWebServices and EventDispatcher. It works fine 
> when called by IPC Server Handler.
> FSQueue#hasAccess is called at three places where (2) and (3) returns null.
> *1. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> FSQueue#hasAccess 
> -> Server.getRemoteAddress returns correct Remote IP.*
>  
> *2. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> 
> AppAddedSchedulerEvent*
>     *EventDispatcher -> FairScheduler#addApplication -> FSQueue.hasAccess -> 
> Server.getRemoteAddress returns null*
>   
> {code:java}
> org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:509)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1268)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:133)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> {code}
>  
> *3. RMWebServices -> QueueACLsManager#checkAccess -> FSQueue.hasAccess -> 
> Server.getRemoteAddress returns null.*
> {code:java}
> org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.checkAccess(FairScheduler.java:1610)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:84)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:553)
> {code}
>  
> Have verified with CapacityScheduler and it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9920) YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from FairScheduler

2019-11-09 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9920:

Attachment: AccessAudist_yarn_clientIPempty.png

> YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from 
> FairScheduler
> --
>
> Key: YARN-9920
> URL: https://issues.apache.org/jira/browse/YARN-9920
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, security
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: AccessAudist_yarn_clientIPempty.png, 
> YARN-9920-001.patch, YARN-9920-002.patch, YARN-9920-003.patch
>
>
> YarnAuthorizationProvider AccessRequest has null RemoteAddress in case of 
> FairScheduler. FSQueue#hasAccess uses Server.getRemoteAddress() which will be 
> null when the call is from RMWebServices and EventDispatcher. It works fine 
> when called by IPC Server Handler.
> FSQueue#hasAccess is called at three places where (2) and (3) returns null.
> *1. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> FSQueue#hasAccess 
> -> Server.getRemoteAddress returns correct Remote IP.*
>  
> *2. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> 
> AppAddedSchedulerEvent*
>     *EventDispatcher -> FairScheduler#addApplication -> FSQueue.hasAccess -> 
> Server.getRemoteAddress returns null*
>   
> {code:java}
> org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:509)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1268)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:133)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> {code}
>  
> *3. RMWebServices -> QueueACLsManager#checkAccess -> FSQueue.hasAccess -> 
> Server.getRemoteAddress returns null.*
> {code:java}
> org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.checkAccess(FairScheduler.java:1610)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:84)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:553)
> {code}
>  
> Have verified with CapacityScheduler and it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org