from:"Chris Nauroth \(JIRA\)"

[jira] [Commented] (YARN-11197) Backport YARN-9608 - DecommissioningNodesWatcher should get lists of running applications on node from RMNode.

2022-06-24 Thread Chris Nauroth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558616#comment-17558616
 ] 

Chris Nauroth commented on YARN-11197:
--

This is good timing. Dataproc actually just recently backported this and tested 
internally. [~groot], I see you are the assignee for the issue right now. Are 
you actively working on this? If not, we could contribute our patch.

CC: [~abmodi], [~mkonst]

> Backport YARN-9608 - DecommissioningNodesWatcher should get lists of running 
> applications on node from RMNode.
> --
>
> Key: YARN-11197
> URL: https://issues.apache.org/jira/browse/YARN-11197
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.10.1, 2.10.2
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>
> There has been ask in community and internally as well to have YARN-9608 for 
> hadoop-2.10 as well. 
> Evaluate and create patch for the same. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11197) Backport YARN-9608 - DecommissioningNodesWatcher should get lists of running applications on node from RMNode.

2022-06-24 Thread Chris Nauroth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558701#comment-17558701
 ] 

Chris Nauroth commented on YARN-11197:
--

Thanks, [~groot]! I would offer to code review for you, but I'll be away from 
my computer for the next 2 weeks. I'll check back when I return, but if another 
committer wants to take it up, that's great too.

> Backport YARN-9608 - DecommissioningNodesWatcher should get lists of running 
> applications on node from RMNode.
> --
>
> Key: YARN-11197
> URL: https://issues.apache.org/jira/browse/YARN-11197
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.10.1, 2.10.2
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There has been ask in community and internally as well to have YARN-9608 for 
> hadoop-2.10 as well. 
> Evaluate and create patch for the same. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11231) FSDownload set wrong permission in destinationTmp

2022-07-27 Thread Chris Nauroth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572042#comment-17572042
 ] 

Chris Nauroth commented on YARN-11231:
--

777 is generally a very dangerous thing. This seems like it would open security 
risks of other users writing into the submitter's directories.

Can you provide more details about the problem and how 777 solves it? In an 
unsecured cluster, this all runs as the yarn user, so I don't see how there 
would be a problem there. In a Kerberos secured cluster, resource localization 
runs as the submitting user, which should be granted access with 755. Is there 
something unique in your configuration that causes a conflict?

> FSDownload set wrong permission in destinationTmp
> -
>
> Key: YARN-11231
> URL: https://issues.apache.org/jira/browse/YARN-11231
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zhang Dongsheng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> FSDownload calls createDir in the call method to create the destinationTmp 
> directory, which is later used as the parent directory to create the 
> directory dFinal, which is used in doAs to perform operations such as path 
> creation and path traversal. doAs cannot determine the user's identity, so 
> there is a problem with setting 755 permissions for destinationTmp here, I 
> think it should be set to 777 permissions here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11360) Add number of decommissioning nodes to YARN cluster metrics.

2022-10-21 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-11360:
-
Description: 
YARN cluster metrics expose counts of NodeManagers in various states including 
active and decommissioned. However, these metrics don't expose NodeManagers 
that are currently in the process of decommissioning. This can look a little 
spooky to a consumer of these metrics. First, the node drops out of the active 
count, so it seems like a node just vanished. Then, later (possibly hours later 
with consideration of graceful decommission), it comes back into existence in 
the decommissioned count.

This issue tracks adding the decommissioning count to the metrics 
ResourceManager RPC. This also enables exposing it in the {{yarn top}} output. 
This metric is already visible through the REST API, so there isn't any change 
required there.

Environment: (was: YARN cluster metrics expose counts of NodeManagers 
in various states including active and decommissioned. However, these metrics 
don't expose NodeManagers that are currently in the process of decommissioning. 
This can look a little spooky to a consumer of these metrics. First, the node 
drops out of the active count, so it seems like a node just vanished. Then, 
later (possibly hours later with consideration of graceful decommission), it 
comes back into existence in the decommissioned count.

This issue tracks adding the decommissioning count to the metrics 
ResourceManager RPC. This also enables exposing it in the {{yarn top}} output. 
This metric is already visible through the REST API, so there isn't any change 
required there.
)

> Add number of decommissioning nodes to YARN cluster metrics.
> 
>
> Key: YARN-11360
> URL: https://issues.apache.org/jira/browse/YARN-11360
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> YARN cluster metrics expose counts of NodeManagers in various states 
> including active and decommissioned. However, these metrics don't expose 
> NodeManagers that are currently in the process of decommissioning. This can 
> look a little spooky to a consumer of these metrics. First, the node drops 
> out of the active count, so it seems like a node just vanished. Then, later 
> (possibly hours later with consideration of graceful decommission), it comes 
> back into existence in the decommissioned count.
> This issue tracks adding the decommissioning count to the metrics 
> ResourceManager RPC. This also enables exposing it in the {{yarn top}} 
> output. This metric is already visible through the REST API, so there isn't 
> any change required there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11360) Add number of decommissioning nodes to YARN cluster metrics.

2022-10-21 Thread Chris Nauroth (Jira)

Chris Nauroth created YARN-11360:


 Summary: Add number of decommissioning nodes to YARN cluster 
metrics.
 Key: YARN-11360
 URL: https://issues.apache.org/jira/browse/YARN-11360
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client, resourcemanager
 Environment: YARN cluster metrics expose counts of NodeManagers in 
various states including active and decommissioned. However, these metrics 
don't expose NodeManagers that are currently in the process of decommissioning. 
This can look a little spooky to a consumer of these metrics. First, the node 
drops out of the active count, so it seems like a node just vanished. Then, 
later (possibly hours later with consideration of graceful decommission), it 
comes back into existence in the decommissioned count.

This issue tracks adding the decommissioning count to the metrics 
ResourceManager RPC. This also enables exposing it in the {{yarn top}} output. 
This metric is already visible through the REST API, so there isn't any change 
required there.

Reporter: Chris Nauroth
Assignee: Chris Nauroth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11360) Add number of decommissioning nodes to YARN cluster metrics.

2022-10-21 Thread Chris Nauroth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622392#comment-17622392
 ] 

Chris Nauroth commented on YARN-11360:
--

Changing the {{yarn top}} output could be viewed as a backward-incompatible 
change according to our policy:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#Command_Line_Interface_.28CLI.29

However, since {{yarn top}} is targeted at interactive use and doesn't seem 
usable in scripting anyway, I tend to think this is acceptable. I'll get input 
from others on this before committing. If necessary, I can split the {{yarn 
top}} part to a separate patch.

> Add number of decommissioning nodes to YARN cluster metrics.
> 
>
> Key: YARN-11360
> URL: https://issues.apache.org/jira/browse/YARN-11360
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
>
> YARN cluster metrics expose counts of NodeManagers in various states 
> including active and decommissioned. However, these metrics don't expose 
> NodeManagers that are currently in the process of decommissioning. This can 
> look a little spooky to a consumer of these metrics. First, the node drops 
> out of the active count, so it seems like a node just vanished. Then, later 
> (possibly hours later with consideration of graceful decommission), it comes 
> back into existence in the decommissioned count.
> This issue tracks adding the decommissioning count to the metrics 
> ResourceManager RPC. This also enables exposing it in the {{yarn top}} 
> output. This metric is already visible through the REST API, so there isn't 
> any change required there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11360) Add number of decommissioning/shutdown nodes to YARN cluster metrics.

2022-10-25 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-11360:
-
Summary: Add number of decommissioning/shutdown nodes to YARN cluster 
metrics.  (was: Add number of decommissioning nodes to YARN cluster metrics.)

[~mkonst], thank you for the review. I've updated this to include the shutdown 
count like you suggested.

> Add number of decommissioning/shutdown nodes to YARN cluster metrics.
> -
>
> Key: YARN-11360
> URL: https://issues.apache.org/jira/browse/YARN-11360
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
>
> YARN cluster metrics expose counts of NodeManagers in various states 
> including active and decommissioned. However, these metrics don't expose 
> NodeManagers that are currently in the process of decommissioning. This can 
> look a little spooky to a consumer of these metrics. First, the node drops 
> out of the active count, so it seems like a node just vanished. Then, later 
> (possibly hours later with consideration of graceful decommission), it comes 
> back into existence in the decommissioned count.
> This issue tracks adding the decommissioning count to the metrics 
> ResourceManager RPC. This also enables exposing it in the {{yarn top}} 
> output. This metric is already visible through the REST API, so there isn't 
> any change required there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11360) Add number of decommissioning/shutdown nodes to YARN cluster metrics.

2022-10-28 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved YARN-11360.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
 Hadoop Flags: Reviewed
   Resolution: Fixed

I have committed this to trunk, branch-3.3 and branch-3.2 (after resolving a 
minor merge conflict). [~mkonst], [~groot] and [~abmodi], thank you for the 
code reviews.

> Add number of decommissioning/shutdown nodes to YARN cluster metrics.
> -
>
> Key: YARN-11360
> URL: https://issues.apache.org/jira/browse/YARN-11360
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>
> YARN cluster metrics expose counts of NodeManagers in various states 
> including active and decommissioned. However, these metrics don't expose 
> NodeManagers that are currently in the process of decommissioning. This can 
> look a little spooky to a consumer of these metrics. First, the node drops 
> out of the active count, so it seems like a node just vanished. Then, later 
> (possibly hours later with consideration of graceful decommission), it comes 
> back into existence in the decommissioned count.
> This issue tracks adding the decommissioning count to the metrics 
> ResourceManager RPC. This also enables exposing it in the {{yarn top}} 
> output. This metric is already visible through the REST API, so there isn't 
> any change required there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11363) Remove unused TimelineVersionWatcher and TimelineVersion from hadoop-yarn-server-tests

2022-11-01 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved YARN-11363.
--
Fix Version/s: 3.3.5
   3.4.0
   Resolution: Fixed

> Remove unused TimelineVersionWatcher and TimelineVersion from 
> hadoop-yarn-server-tests 
> ---
>
> Key: YARN-11363
> URL: https://issues.apache.org/jira/browse/YARN-11363
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test, yarn
>Affects Versions: 3.3.3, 3.3.4
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>
> Verify and remove unused TimelineVersionWatcher and TimelineVersion from 
> hadoop-yarn-server-tests 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11388) Prevent resource leaks in TestClientRMService.

2022-12-05 Thread Chris Nauroth (Jira)

Chris Nauroth created YARN-11388:


 Summary: Prevent resource leaks in TestClientRMService.
 Key: YARN-11388
 URL: https://issues.apache.org/jira/browse/YARN-11388
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth


While working on YARN-11360, I noticed a few problems in 
{{TestClientRMService}} that made it difficult to work with. Tests do not 
guarantee that servers they start up get shutdown. If an individual test fails, 
then it can leave TCP sockets bound, causing subsequent tests in the suite to 
fail on their socket bind attempts for the same port. There is also a file 
generated by a test that is leaking outside of the build directory into the 
source tree.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11390) TestResourceTrackerService.testNodeRemovalNormally: Shutdown nodes should be 0 now expected: <1> but was: <0>

2022-12-08 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved YARN-11390.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

[~bkosztolnik] , thank you for the contribution. [~pszucs], thank you for 
reviewing. I have committed this to trunk, branch-3.3 and branch-3.2. For the 
cherry-picks to branch-3.3 and branch-3.2, I resolved some minor merge 
conflicts and confirmed a successful test run.

> TestResourceTrackerService.testNodeRemovalNormally: Shutdown nodes should be 
> 0 now expected: <1> but was: <0>
> -
>
> Key: YARN-11390
> URL: https://issues.apache.org/jira/browse/YARN-11390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Bence Kosztolnik
>Assignee: Bence Kosztolnik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>
> Some times the TestResourceTrackerService.{*}testNodeRemovalNormally{*} fails 
> with the following message
> {noformat}
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:1723)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:1685)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalNormally(TestResourceTrackerService.java:1530){noformat}
> This can happen in case if the hardcoded 1s sleep in the test not enough for 
> proper shut down.
> To fix this issue we should poll the cluster status with a time out, and see 
> the cluster can reach the expected state



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11397) Memory leak when reading aggregated logs from s3 (LogAggregationTFileController::readAggregatedLogs)

2022-12-16 Thread Chris Nauroth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648770#comment-17648770
 ] 

Chris Nauroth commented on YARN-11397:
--

Is this the same {{S3AInstrumentation}} leak issue as HADOOP-18526, which is 
scheduled for inclusion in the upcoming 3.3.5 release?

CC: [~ste...@apache.org]

> Memory leak when reading aggregated logs from s3 
> (LogAggregationTFileController::readAggregatedLogs)
> 
>
> Key: YARN-11397
> URL: https://issues.apache.org/jira/browse/YARN-11397
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.2
> Environment: Remote logs dir on s3.
>Reporter: Maciej Smolenski
>Priority: Critical
> Attachments: YarnLogsS3Issue.scala
>
>
> Reproduction code in the attachment.
> When collecting aggregated logs from s3 in a loop (see reproduction code) we 
> can easily see that the number of 'S3AInstrumentation' is increasing although 
> the number of 'S3AFileSystem' is not increasing. It means that 
> 'S3AInstrumentation' is not released together with 'S3AFileSystem' as it 
> should be. The root cause of this seems to be the missing close on 
> S3AFileSystem.
> The issue seems similar to https://issues.apache.org/jira/browse/YARN-11039 
> but the issue is a 'memory leak' (not a 'thread leak') and affected version 
> is earlier here (3.2.2).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-11392) ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods but forget to use sometimes, caused audit log missing.

2022-12-22 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned YARN-11392:


Assignee: Beibei Zhao

> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods 
> but forget to use sometimes, caused audit log missing.
> 
>
> Key: YARN-11392
> URL: https://issues.apache.org/jira/browse/YARN-11392
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.4
>Reporter: Beibei Zhao
>Assignee: Beibei Zhao
>Priority: Major
>  Labels: audit, log, pull-request-available, yarn
>
> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods.
> {code:java}
> private UserGroupInformation getCallerUgi(ApplicationId applicationId,
>   String operation) throws YarnException {
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>   RMAuditLogger.logFailure("UNKNOWN", operation, "UNKNOWN",
>   "ClientRMService", "Error getting UGI", applicationId);
>   throw RPCUtil.getRemoteException(ie);
> }
> return callerUGI;
>   }
> {code}
> *Privileged operations* like "getContainerReport" (which called checkAccess 
> before op) will call them and *record audit logs* when an *exception* 
> happens, but forget to use sometimes, caused audit log {*}missing{*}: 
> {code:java}
> // getApplicationReport
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>      // a logFailure should be called here. 
>      throw RPCUtil.getRemoteException(ie);
> }
> {code}
> So, I will replace some code blocks like this with getCallerUgi or 
> verifyUserAccessForRMApp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11392) ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods but forget to use sometimes, caused audit log missing.

2022-12-27 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved YARN-11392.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

I have committed this to trunk, branch-3.3 and branch-3.2. [~chino71], thank 
you for the contribution.

> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods 
> but forget to use sometimes, caused audit log missing.
> 
>
> Key: YARN-11392
> URL: https://issues.apache.org/jira/browse/YARN-11392
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.4
>Reporter: Beibei Zhao
>Assignee: Beibei Zhao
>Priority: Major
>  Labels: audit, log, pull-request-available, yarn
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>
> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods.
> {code:java}
> private UserGroupInformation getCallerUgi(ApplicationId applicationId,
>   String operation) throws YarnException {
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>   RMAuditLogger.logFailure("UNKNOWN", operation, "UNKNOWN",
>   "ClientRMService", "Error getting UGI", applicationId);
>   throw RPCUtil.getRemoteException(ie);
> }
> return callerUGI;
>   }
> {code}
> *Privileged operations* like "getContainerReport" (which called checkAccess 
> before op) will call them and *record audit logs* when an *exception* 
> happens, but forget to use sometimes, caused audit log {*}missing{*}: 
> {code:java}
> // getApplicationReport
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>      // a logFailure should be called here. 
>      throw RPCUtil.getRemoteException(ie);
> }
> {code}
> So, I will replace some code blocks like this with getCallerUgi or 
> verifyUserAccessForRMApp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11388) Prevent resource leaks in TestClientRMService.

2022-12-28 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved YARN-11388.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

I have merged this to trunk, branch-3.3 and branch-3.2 (after resolving some 
minor merge conflicts). [~slfan1989] , thank you for your review!

> Prevent resource leaks in TestClientRMService.
> --
>
> Key: YARN-11388
> URL: https://issues.apache.org/jira/browse/YARN-11388
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>
> While working on YARN-11360, I noticed a few problems in 
> {{TestClientRMService}} that made it difficult to work with. Tests do not 
> guarantee that servers they start up get shutdown. If an individual test 
> fails, then it can leave TCP sockets bound, causing subsequent tests in the 
> suite to fail on their socket bind attempts for the same port. There is also 
> a file generated by a test that is leaking outside of the build directory 
> into the source tree.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11231) FSDownload set wrong permission in destinationTmp

2023-01-07 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved YARN-11231.
--
  Assignee: Zhang Dongsheng
Resolution: Won't Fix

Hello [~skysider]. I noticed you closed pull request 
[#4629|https://github.com/apache/hadoop/pull/4629]. I assume you are abandoning 
this change, because 777 would be too dangerous, so I'm also closing this 
corresponding JIRA issue. (If I misunderstood, and you're still working on 
something for this, then the issue can be reopened.)

> FSDownload set wrong permission in destinationTmp
> -
>
> Key: YARN-11231
> URL: https://issues.apache.org/jira/browse/YARN-11231
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zhang Dongsheng
>Assignee: Zhang Dongsheng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> FSDownload calls createDir in the call method to create the destinationTmp 
> directory, which is later used as the parent directory to create the 
> directory dFinal, which is used in doAs to perform operations such as path 
> creation and path traversal. doAs cannot determine the user's identity, so 
> there is a problem with setting 755 permissions for destinationTmp here, I 
> think it should be set to 777 permissions here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails causes RM recovery to take too much time

2023-04-06 Thread Chris Nauroth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709507#comment-17709507
 ] 

Chris Nauroth commented on YARN-10739:
--

I've seen trouble with this in 3.3 and 3.2 clusters. This patch does not depend 
on the larger YARN-10695 umbrella effort, so I'm planning to cherry-pick it to 
branch-3.3 and branch-3.2 I'll wait a day in case anyone has objections. See 
also YARN-11286.

> GenericEventHandler.printEventQueueDetails causes RM recovery to take too 
> much time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch, 
> YARN-10739.005.patch, YARN-10739.006.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11286) Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter configurable

2023-04-06 Thread Chris Nauroth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709508#comment-17709508
 ] 

Chris Nauroth commented on YARN-11286:
--

I've seen trouble with this in 3.3 and 3.2 clusters. This patch does not depend 
on the larger YARN-10695 umbrella effort, so I'm planning to cherry-pick it to 
branch-3.3 and branch-3.2 I'll wait a day in case anyone has objections. See 
also YARN-10739.

> Make AsyncDispatcher#printEventDetailsExecutor thread pool parameter 
> configurable
> -
>
> Key: YARN-11286
> URL: https://issues.apache.org/jira/browse/YARN-11286
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> AsyncDispatcher#printEventDetailsExecutor thread pool parameters are 
> hard-coded, extract this part of hard-coded configuration parameters to the 
> configuration file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows

2015-12-16 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3458:

Assignee: Inigo Goiri  (was: Chris Nauroth)

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch, YARN-3458-9.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3458) CPU resource monitoring in Windows

2015-12-16 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned YARN-3458:
---

Assignee: Chris Nauroth  (was: Inigo Goiri)

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch, YARN-3458-9.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows

2015-12-16 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3458:

Hadoop Flags: Reviewed

+1 for patch v9.  I'll wait a few days before committing, since I see other 
watchers on the issue.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch, YARN-3458-9.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages

2016-02-09 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139292#comment-15139292
 ] 

Chris Nauroth commented on YARN-4681:
-

Linking to YARN-1775, which initially implemented the support for reading 
/proc//smaps information.  Also notifying a few of the participants on 
that earlier discussion: [~rajesh.balamohan], [~ka...@cloudera.com] and 
[~vinodkv].

This issue relates to this discussion on the dev mailing list:

http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201602.mbox/%3CD2DF3D85.3AAF8%25cnauroth%40hortonworks.com%3E

> ProcfsBasedProcessTree should not calculate private clean pages
> ---
>
> Key: YARN-4681
> URL: https://issues.apache.org/jira/browse/YARN-4681
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Jan Lukavsky
> Attachments: YARN-4681.patch
>
>
> ProcfsBasedProcessTree in Node manager calculates memory used by a process 
> tree by parsing {{/etc//smaps}}, where it calculates {{min(Pss, 
> Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} 
> private clean pages can be reclaimed by kernel, this should be changed to 
> calculating only {{Locked}} pages instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages

2016-02-09 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139318#comment-15139318
 ] 

Chris Nauroth commented on YARN-4681:
-

[~je.ik], thank you for posting a patch.  Have you had a chance to try testing 
this with the Spark use case that you described on the mailing list?

One important point that you brought up on the mailing list is that the Locked 
field in smaps doesn't seem to be universal across all tested kernel versions.

bq. If I do this on an older kernel (2.6.x), the Locked field is missing.

The current patch completely switches the calculation from using Private_Clean 
to Locked.  For a final version of the patch, we'd want to make sure that the 
change doesn't break anything for older kernels that don't show the Locked 
field.

You also discussed possibly even more aggressive strategies, like trying to 
anticipate that the kernel might free more memory by flushing file-backed dirty 
pages.  During YARN-1775, there was some discussion about supporting different 
kinds of configurability for this calculation.  That topic might warrant 
further discussion here.

> ProcfsBasedProcessTree should not calculate private clean pages
> ---
>
> Key: YARN-4681
> URL: https://issues.apache.org/jira/browse/YARN-4681
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Jan Lukavsky
> Attachments: YARN-4681.patch
>
>
> ProcfsBasedProcessTree in Node manager calculates memory used by a process 
> tree by parsing {{/etc//smaps}}, where it calculates {{min(Pss, 
> Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} 
> private clean pages can be reclaimed by kernel, this should be changed to 
> calculating only {{Locked}} pages instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-09 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139679#comment-15139679
 ] 

Chris Nauroth commented on YARN-4594:
-

This patch broke compilation on Mac OS X 10.9, where the SDK does not include a 
definition of {{AT_REMOVEDIR}} in fcntl.h or unistd.h.  If you have the 10.10 
SDK installed, then the header does have {{AT_REMOVEDIR}}.  (See 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/sys/fcntl.h).
  I haven't yet figured out if there is an easy way Mac users can set it to 
cross-compile with the 10.10 SDK as a workaround.  For now, I'm just going to 
have to skip this part of the build on Mac.

> container-executor fails to remove directory tree when chmod required
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.9.0
>
> Attachments: YARN-4594.001.patch, YARN-4594.002.patch, 
> YARN-4594.003.patch, YARN-4594.004.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-09 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139709#comment-15139709
 ] 

Chris Nauroth commented on YARN-4594:
-

I don't actually run it on Mac.  The impact though is that I can no longer do a 
full distro build of the source tree with {{-Pnative}}.  libhadoop.dylib is at 
least partially functional.

> container-executor fails to remove directory tree when chmod required
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.9.0
>
> Attachments: YARN-4594.001.patch, YARN-4594.002.patch, 
> YARN-4594.003.patch, YARN-4594.004.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4594) container-executor fails to remove directory tree when chmod required

2016-02-09 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139827#comment-15139827
 ] 

Chris Nauroth commented on YARN-4594:
-

[~cmccabe], no worries, and no finger pointing intended.  I only meant to 
document what I had found here in case other Mac users stumble on the same 
issue.

FWIW, I'd prefer not to patch the Hadoop source at all and instead find some 
external way to target the 10.10 SDK, where the constant is defined.

> container-executor fails to remove directory tree when chmod required
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.9.0
>
> Attachments: YARN-4594.001.patch, YARN-4594.002.patch, 
> YARN-4594.003.patch, YARN-4594.004.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4682) AMRM client to log when AMRM token updated

2016-02-12 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-4682:

Assignee: Prabhu Joseph

[~Prabhu Joseph], I added you as a contributor on the YARN project and assigned 
this issue to you.  Thanks for the patch!

[~ste...@apache.org], I added you to the Committers role in the YARN project, 
so you should have the rights to do this in the future.

> AMRM client to log when AMRM token updated
> --
>
> Key: YARN-4682
> URL: https://issues.apache.org/jira/browse/YARN-4682
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Prabhu Joseph
> Attachments: YARN-4682-002.patch, YARN-4682.patch, YARN-4682.patch.1
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> There's no information right now as to when the AMRM token gets updated; if 
> something has gone wrong with the update, you can't tell when it last when 
> through.
> fix: add a log statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages

2016-03-01 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-4681:

Assignee: Jan Lukavsky

> ProcfsBasedProcessTree should not calculate private clean pages
> ---
>
> Key: YARN-4681
> URL: https://issues.apache.org/jira/browse/YARN-4681
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Jan Lukavsky
>Assignee: Jan Lukavsky
> Attachments: YARN-4681.patch, YARN-4681.patch
>
>
> ProcfsBasedProcessTree in Node manager calculates memory used by a process 
> tree by parsing {{/etc//smaps}}, where it calculates {{min(Pss, 
> Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} 
> private clean pages can be reclaimed by kernel, this should be changed to 
> calculating only {{Locked}} pages instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages

2016-03-01 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174361#comment-15174361
 ] 

Chris Nauroth commented on YARN-4681:
-

[~je.ik], thank you for updating the patch.  I'm +1 for this change, pending a 
pre-commit test run from Jenkins.  I just clicked the Submit Patch button, so 
Jenkins should pick it up now.

However, I'm not confident enough to commit it immediately.  I'd like to see 
reviews from committers who spend more time in YARN than me.  I'd also like to 
find out if anyone thinks it should be configurable whether it checks locked or 
performs the old calculation.  I don't have a sense for how widely people are 
dependent on the current smaps checks.

> ProcfsBasedProcessTree should not calculate private clean pages
> ---
>
> Key: YARN-4681
> URL: https://issues.apache.org/jira/browse/YARN-4681
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Jan Lukavsky
>Assignee: Jan Lukavsky
> Attachments: YARN-4681.patch, YARN-4681.patch
>
>
> ProcfsBasedProcessTree in Node manager calculates memory used by a process 
> tree by parsing {{/etc//smaps}}, where it calculates {{min(Pss, 
> Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} 
> private clean pages can be reclaimed by kernel, this should be changed to 
> calculating only {{Locked}} pages instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4887) AM-RM protocol changes for identifying resource-requests explicitly

2016-05-21 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295122#comment-15295122
 ] 

Chris Nauroth commented on YARN-4887:
-

Hello [~subru].  I think the YARN build would need configuration to exclude 
protobuf-generated sources, which do tend to generate a lot of Javadoc warnings 
that we can't do anything about.  For example, here is an exclusion from 
hadoop-hdfs-project/hadoop-hdfs/pom.xml:

{code}
  
org.apache.maven.plugins
maven-javadoc-plugin

  
org.apache.hadoop.hdfs.protocol.proto

  
{code}

I don't see a similar exclusion in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml.

> AM-RM protocol changes for identifying resource-requests explicitly
> ---
>
> Key: YARN-4887
> URL: https://issues.apache.org/jira/browse/YARN-4887
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-4887-v1.patch, YARN-4887-v2.patch
>
>
> YARN-4879 proposes the addition of a simple delta allocate protocol. This 
> JIRA is to track the changes in AM-RM protocol to accomplish it. The crux is 
> the addition of ID field in ResourceRequest and Container. The detailed 
> proposal is in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-10-01 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939990#comment-14939990
 ] 

Chris Nauroth commented on YARN-3514:
-

Hello [~vvasudev].

As per prior comments from [~leftnoteasy] and [~vinodkv], we suspect the 
current patch does not fully address all potential problems with use of Active 
Directory "DOMAIN\login" usernames in YARN.  I don't have bandwidth right now 
to hunt down those additional problems and fix them.

I think these are the options for handling this JIRA now:
# Finish the review of the fix that is already here and commit it.  Handle 
subsequent issues in separate JIRAs.
# Unassign it from me and see if someone else can pick it up, run with my 
current patch, look for more problems and then turn that into a more 
comprehensive patch.
# Continue to let this linger until I or someone else frees up time for more 
investigation.

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047165#comment-15047165
 ] 

Chris Nauroth commented on YARN-4248:
-

This patch introduced license warnings on the testing json files.  Here is an 
example from the latest pre-commit run on HADOOP-11505.

https://builds.apache.org/job/PreCommit-HADOOP-Build/8202/artifact/patchprocess/patch-asflicense-problems.txt

Would you please either revert or quickly correct the license warning?  Thank 
you.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248.2.patch, YARN-4248.3.patch, YARN-4248.5.patch, 
> YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047389#comment-15047389
 ] 

Chris Nauroth commented on YARN-4248:
-

Hi [~curino].  It looks like [~chris.douglas] just uploaded a patch to set up 
an exclusion of the json files from the license check.  +1 for this.  Thanks, 
Chris.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047553#comment-15047553
 ] 

Chris Nauroth commented on YARN-4248:
-

bq. Not sure why it wasn't flagged by test-patch.

I decided to dig into this.  At the time that pre-commit ran for YARN-4248, 
there was an unrelated license warning present in HDFS, introduced by HDFS-9414.

https://builds.apache.org/job/PreCommit-YARN-Build/9872/artifact/patchprocess/patch-asflicense-problems.txt

Unfortunately, if there is a pre-existing license warning, then the {{mvn 
apache-rat:check}} build halts at that first failing module.  Since 
hadoop-hdfs-client builds before hadoop-yarn-server-resourcemanager, it masked 
the new license warnings introduced by this patch.  This is visible here if you 
scroll to the bottom and notice module Apache Hadoop HDFS Client failed, 
followed by skipping all subsequent modules.

https://builds.apache.org/job/PreCommit-YARN-Build/9872/artifact/patchprocess/patch-asflicense-root.txt

Maybe we can do better when there are pre-existing license warnings, perhaps by 
using the {{--fail-at-end}} option to make sure we check all modules.  I filed 
YETUS-221.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-05-04 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527907#comment-14527907
 ] 

Chris Nauroth commented on YARN-3514:
-

Looking at the original description, I see upper-case "DOMAIN" is getting 
translated to lower-case "domain" in this environment.  It's likely that this 
environment would get an ownership mismatch error even after getting past the 
current bug.

{code}
drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
{code}

Nice catch, Wangda.

Is it necessary to translate to lower-case, or can the domain portion of the 
name be left in upper-case to match the OS level?

bq. One possible solution is ignoring cases while compare user name, but that 
will be problematic when user "De"/"de" existed at the same time.

I've seen a few mentions online that Active Directory is not case-sensitive but 
is case-preserving.  That means it will preserve the case you used in 
usernames, but the case doesn't matter for comparisons.  I've also seen 
references that DNS has similar behavior with regards to case.

I can't find a definitive statement though that this is guaranteed behavior.  
I'd feel safer making this kind of change if we had a definitive reference.

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/dom

[jira] [Commented] (YARN-3549) use JNI-based FileStatus implementation from io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation from RawLocalFileSystem in checkLocalDir.

2015-05-05 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528725#comment-14528725
 ] 

Chris Nauroth commented on YARN-3549:
-

Hi [~djp].  This will be a YARN code change in the localizer, so YARN is the 
appropriate project to track it.  The code change will involve calling a native 
fstat method provided in hadoop-common, but that code already exists, and I 
don't expect it will need any changes to support this.

> use JNI-based FileStatus implementation from 
> io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation 
> from RawLocalFileSystem in checkLocalDir.
> 
>
> Key: YARN-3549
> URL: https://issues.apache.org/jira/browse/YARN-3549
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> Use JNI-based FileStatus implementation from 
> io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation 
> from RawLocalFileSystem in checkLocalDir.
> As discussed in YARN-3491, shell-based implementation getPermission runs 
> shell command "ls -ld" to get permission, which take 4 or 5 ms(very slow).
> We should switch to io.nativeio.NativeIO.POSIX#getFstat as implementation in 
> RawLocalFileSystem to get rid of shell-based implementation for FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3549) use JNI-based FileStatus implementation from io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation from RawLocalFileSystem in checkLocalDir.

2015-05-07 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532965#comment-14532965
 ] 

Chris Nauroth commented on YARN-3549:
-

bq. Can we not get RawLocalFileSystem to automatically use native fstat if the 
native library is available? That way all users can simply benefit from this 
seamlessly.

That's an interesting idea.  Looking at this more closely, 
{{ResourceLocalizationService#checkLocalDir}} really can't use the existing 
{{NativeIO.POSIX#getFstat}} method anyway.  That one is a passthrough to 
{{fstat}}, which operates on an open file descriptor.  Instead, 
{{ResourceLocalizationService#checkLocalDir}} really wants plain {{stat}} or 
maybe {{lstat}}, which operates on a path string.  Forcing this code path to 
open the file just for the sake of passing an fd to {{fstat}} isn't ideal.

Let's try it!  I filed HADOOP-11935 as a pre-requisite to do the Hadoop Common 
work.

> use JNI-based FileStatus implementation from 
> io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation 
> from RawLocalFileSystem in checkLocalDir.
> 
>
> Key: YARN-3549
> URL: https://issues.apache.org/jira/browse/YARN-3549
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> Use JNI-based FileStatus implementation from 
> io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation 
> from RawLocalFileSystem in checkLocalDir.
> As discussed in YARN-3491, shell-based implementation getPermission runs 
> shell command "ls -ld" to get permission, which take 4 or 5 ms(very slow).
> We should switch to io.nativeio.NativeIO.POSIX#getFstat as implementation in 
> RawLocalFileSystem to get rid of shell-based implementation for FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546365#comment-14546365
 ] 

Chris Nauroth commented on YARN-3626:
-

I don't fully understand the objection to the former patch that had been 
committed.

bq. The new configuration added is supposed to be per app, but it is now a 
server side configuration.

There was a new YARN configuration property for triggering this behavior, but 
the MR application would toggle on that YARN property only if the MR job 
submission had {{MAPREDUCE_JOB_USER_CLASSPATH_FIRST}} on.  From {{MRApps}}:

{code}
boolean userClassesTakesPrecedence = 
  conf.getBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, false);

if (userClassesTakesPrecedence) {
  conf.set(YarnConfiguration.YARN_APPLICATION_CLASSPATH_PREPEND_DISTCACHE,
"true");
}
{code}

I thought this implemented "per app" behavior, because it could vary between MR 
app submission instances.  It would not be a requirement to put 
{{YARN_APPLICATION_CLASSPATH_PREPEND_DISTCACHE}} into the server configs and 
have the client and server share configs.

Is there a detail I'm missing?

> On Windows localized resources are not moved to the front of the classpath 
> when they should be
> --
>
> Key: YARN-3626
> URL: https://issues.apache.org/jira/browse/YARN-3626
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.7.1
>
> Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
> YARN-3626.9.patch
>
>
> In response to the mapreduce.job.user.classpath.first setting the classpath 
> is ordered differently so that localized resources will appear before system 
> classpath resources when tasks execute.  On Windows this does not work 
> because the localized resources are not linked into their final location when 
> the classpath jar is created.  To compensate for that localized jar resources 
> are added directly to the classpath generated for the jar rather than being 
> discovered from the localized directories.  Unfortunately, they are always 
> appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-15 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546400#comment-14546400
 ] 

Chris Nauroth commented on YARN-3626:
-

I see now.  Thanks for the clarification.  In that case, I agree with the new 
proposal.

> On Windows localized resources are not moved to the front of the classpath 
> when they should be
> --
>
> Key: YARN-3626
> URL: https://issues.apache.org/jira/browse/YARN-3626
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.7.1
>
> Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
> YARN-3626.9.patch
>
>
> In response to the mapreduce.job.user.classpath.first setting the classpath 
> is ordered differently so that localized resources will appear before system 
> classpath resources when tasks execute.  On Windows this does not work 
> because the localized resources are not linked into their final location when 
> the classpath jar is created.  To compensate for that localized jar resources 
> are added directly to the classpath generated for the jar rather than being 
> discovered from the localized directories.  Unfortunately, they are always 
> appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3685) NodeManager unnecessarily knows about classpath-jars due to Windows limitations

2015-05-20 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552680#comment-14552680
 ] 

Chris Nauroth commented on YARN-3685:
-

[~vinodkv], thanks for the notification.  I was not aware of this design goal 
at the time of YARN-316.

Perhaps it's possible to move the classpath jar generation to the MR client or 
AM.  It's not immediately obvious to me which of those 2 choices is better.  
We'd need to change the manifest to use relative paths in the Class-Path 
attribute instead of absolute paths.  (The client and AM are not aware of the 
exact layout of the NodeManager's {{yarn.nodemanager.local-dirs}}, so the 
client can't predict the absolute paths at time of container launch.)

There is one piece of logic that I don't see how to handle though.  Some 
classpath entries are defined in terms of environment variables.  These 
environment variables are expanded at the NodeManager via the container launch 
scripts.  This was true of Linux even before YARN-316, so in that sense, YARN 
did already have some classpath logic indirectly.  Environment variables cannot 
be used inside a manifest's Class-Path, so for Windows, NodeManager expands the 
environment variables before populating Class-Path.  It would be incorrect to 
do the environment variable expansion at the MR client, because it might be 
running with different configuration than the NodeManager.  I suppose if the AM 
did the expansion, then that would work in most cases, but it creates an 
assumption that the AM container is running with configuration that matches all 
NodeManagers in the cluster.  I don't believe that assumption exists today.

If we do move classpath handling out of the NodeManager, then it would be a 
backwards-incompatible change, and so it could not be shipped in the 2.x 
release line.

> NodeManager unnecessarily knows about classpath-jars due to Windows 
> limitations
> ---
>
> Key: YARN-3685
> URL: https://issues.apache.org/jira/browse/YARN-3685
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Found this while looking at cleaning up ContainerExecutor via YARN-3648, 
> making it a sub-task.
> YARN *should not* know about classpaths. Our original design modeled around 
> this. But when we added windows suppport, due to classpath issues, we ended 
> up breaking this abstraction via YARN-316. We should clean this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-25 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558495#comment-14558495
 ] 

Chris Nauroth commented on YARN-3626:
-

Hi Craig.  This looks good to me.  I have just one minor nitpick.  I think the 
logic in {{ContainerLaunch}} for setting {{preferLocalizedJars}} could be 
simplified to this:

{code}
boolean preferLocalizedJars = Boolean.valueOf(classpathPrependDistCache);
{code}

{{Boolean#valueOf}} is null-safe.

Thanks!

> On Windows localized resources are not moved to the front of the classpath 
> when they should be
> --
>
> Key: YARN-3626
> URL: https://issues.apache.org/jira/browse/YARN-3626
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.7.1
>
> Attachments: YARN-3626.0.patch, YARN-3626.11.patch, 
> YARN-3626.14.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch
>
>
> In response to the mapreduce.job.user.classpath.first setting the classpath 
> is ordered differently so that localized resources will appear before system 
> classpath resources when tasks execute.  On Windows this does not work 
> because the localized resources are not linked into their final location when 
> the classpath jar is created.  To compensate for that localized jar resources 
> are added directly to the classpath generated for the jar rather than being 
> discovered from the localized directories.  Unfortunately, they are always 
> appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3685) NodeManager unnecessarily knows about classpath-jars due to Windows limitations

2015-05-26 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559464#comment-14559464
 ] 

Chris Nauroth commented on YARN-3685:
-

bq. Which ones are these?

I was thinking of stuff like {{yarn.application.classpath}}, where values are 
defined in terms of things like the {{HADOOP_YARN_HOME}} and 
{{HADOOP_COMMON_HOME}} environment variables, and those values might not match 
the file system layout at the client side.

bq. Not clear this is true or not. Have to see the final solution/patch to 
realistically reason about this.

Let's say hypothetically a change for this goes into 2.8.0.  I was thinking 
that would make it impossible for a 2.7.0 client to work correctly with a 2.8.0 
NodeManager, because that client wouldn't take care of classpath bundling and 
instead expect the NodeManager to do it.

Brainstorming a bit, maybe we can figure out a way for a 2.8.0 NodeManager to 
detect if the client hasn't already taken care of classpath bundling, and if 
not, stick to the current logic.  Backwards-compatibility logic like this would 
go into branch-2, but could be dropped from trunk.

> NodeManager unnecessarily knows about classpath-jars due to Windows 
> limitations
> ---
>
> Key: YARN-3685
> URL: https://issues.apache.org/jira/browse/YARN-3685
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Vinod Kumar Vavilapalli
>
> Found this while looking at cleaning up ContainerExecutor via YARN-3648, 
> making it a sub-task.
> YARN *should not* know about classpaths. Our original design modeled around 
> this. But when we added windows suppport, due to classpath issues, we ended 
> up breaking this abstraction via YARN-316. We should clean this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-26 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559939#comment-14559939
 ] 

Chris Nauroth commented on YARN-3626:
-

Thanks, Craig!  We could potentially stick the {{@Private}} annotation directly 
onto {{ApplicationConstants#CLASSPATH_PREPEND_DISTCACHE}}.  I'll let Vinod 
chime in on whether or not this was the intent of his feedback.

+1 from me, pending Jenkins run.

> On Windows localized resources are not moved to the front of the classpath 
> when they should be
> --
>
> Key: YARN-3626
> URL: https://issues.apache.org/jira/browse/YARN-3626
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.7.1
>
> Attachments: YARN-3626.0.patch, YARN-3626.11.patch, 
> YARN-3626.14.patch, YARN-3626.15.patch, YARN-3626.4.patch, YARN-3626.6.patch, 
> YARN-3626.9.patch
>
>
> In response to the mapreduce.job.user.classpath.first setting the classpath 
> is ordered differently so that localized resources will appear before system 
> classpath resources when tasks execute.  On Windows this does not work 
> because the localized resources are not linked into their final location when 
> the classpath jar is created.  To compensate for that localized jar resources 
> are added directly to the classpath generated for the jar rather than being 
> discovered from the localized directories.  Unfortunately, they are always 
> appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3685) NodeManager unnecessarily knows about classpath-jars due to Windows limitations

2015-05-26 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560391#comment-14560391
 ] 

Chris Nauroth commented on YARN-3685:
-

bq. YARN_APPLICATION_CLASSPATH is essentially unused.

In that case, this is definitely worth revisiting as part of this issue.  
Perhaps it's not a problem anymore.  This had been used in the past, as seen in 
bug reports like YARN-1138.

> NodeManager unnecessarily knows about classpath-jars due to Windows 
> limitations
> ---
>
> Key: YARN-3685
> URL: https://issues.apache.org/jira/browse/YARN-3685
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Vinod Kumar Vavilapalli
>
> Found this while looking at cleaning up ContainerExecutor via YARN-3648, 
> making it a sub-task.
> YARN *should not* know about classpaths. Our original design modeled around 
> this. But when we added windows suppport, due to classpath issues, we ended 
> up breaking this abstraction via YARN-316. We should clean this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3786) Document yarn class path options

2015-06-08 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3786:

Component/s: documentation
 Issue Type: Improvement  (was: Bug)

> Document yarn class path options
> 
>
> Key: YARN-3786
> URL: https://issues.apache.org/jira/browse/YARN-3786
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3786.patch
>
>
> --global, --jar options are not documented.
> {code}
> $ yarn classpath --help
> classpath [--glob|--jar |-h|--help] :
>   Prints the classpath needed to get the Hadoop jar and the required
>   libraries.
>   Options:
>   --glob   expand wildcards
>   --jar  write classpath as manifest in jar named 
>   -h, --help   print help
> {code}
> current document:
> {code}
> User Commands
> Commands useful for users of a hadoop cluster.
> classpath
> Usage: yarn classpath
> Prints the class path needed to get the Hadoop jar and the required libraries
> {code}
> http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#classpath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-19 Thread Chris Nauroth (JIRA)

Chris Nauroth created YARN-3834:
---

 Summary: Scrub debug logging of tokens during resource 
localization.
 Key: YARN-3834
 URL: https://issues.apache.org/jira/browse/YARN-3834
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chris Nauroth
Assignee: Chris Nauroth


During resource localization, the NodeManager logs tokens at debug level to aid 
troubleshooting.  This includes the full token representation.  Best practice 
is to avoid logging anything secret, even at debug level.  We can improve on 
this by changing the logging to use a scrubbed representation of the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-19 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3834:

Attachment: YARN-3834.001.patch

The attached patch changes the code to use {{Token#toString}}.  The 
{{toString}} method is already coded to be safe for logging, because it does 
not include any representation of the secret.  Thanks also to [~vicaya] for the 
suggestion to add logging of a fingerprint of the full representation, which is 
a one-way hash (non-reversible, therefore safe).

> Scrub debug logging of tokens during resource localization.
> ---
>
> Key: YARN-3834
> URL: https://issues.apache.org/jira/browse/YARN-3834
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: YARN-3834.001.patch
>
>
> During resource localization, the NodeManager logs tokens at debug level to 
> aid troubleshooting.  This includes the full token representation.  Best 
> practice is to avoid logging anything secret, even at debug level.  We can 
> improve on this by changing the logging to use a scrubbed representation of 
> the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-21 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595338#comment-14595338
 ] 

Chris Nauroth commented on YARN-3834:
-

Xuan, thank you for the code review and commit.

> Scrub debug logging of tokens during resource localization.
> ---
>
> Key: YARN-3834
> URL: https://issues.apache.org/jira/browse/YARN-3834
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 2.8.0
>
> Attachments: YARN-3834.001.patch
>
>
> During resource localization, the NodeManager logs tokens at debug level to 
> aid troubleshooting.  This includes the full token representation.  Best 
> practice is to avoid logging anything secret, even at debug level.  We can 
> improve on this by changing the logging to use a scrubbed representation of 
> the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-5121) fix some container-executor portability issues

2016-07-29 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400171#comment-15400171
 ] 

Chris Nauroth commented on YARN-5121:
-

[~aw], thank you for this patch.  I have confirmed a successful full build and 
run of test-container-executor on OS X and Linux.

Just a few questions:

bq. For 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/compat/{fstatat|openat|unlinkat}.h:

I just want to double-check with you that the fchmodat.h and fdopendir.h 
implementations are not BSD-licensed, and that's why they're not listed in 
LICENSE.txt and instead have an Apache license header.  Is that correct?

{code}
  fprintf(stderr,"ret = %s\n", ret);
{code}

Chris D mentioned previously that this might have been a leftover from 
debugging.  Did you intend to keep it, or should we drop it?

{code}
char* get_executable() {
 return __get_exec_readproc("/proc/self/path/a.out");
}
{code}

Please check the indentation on the return statement.

Is "/proc/self/path/a.out" correct?  The /proc/self part makes sense to me, but 
the rest of it surprised me.  Is that a.out like the default gcc binary output 
path?  I have nearly zero experience with Solaris, so I trust your knowledge 
here.  :-)


> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, YARN-5121.06.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5121) fix some container-executor portability issues

2016-07-29 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400252#comment-15400252
 ] 

Chris Nauroth commented on YARN-5121:
-

Thanks for the detailed explanation.  It's all clear to me now.  I expect this 
will be ready to commit after your next revision to fix the few remaining 
nitpicks.  That next revision can fix the one remaining compiler warning too.

[~chris.douglas], let us know if you have any more feedback.  If not, then I 
would likely +1 and commit soon.

bq. (This whole conversation is rather timely, given that Roger Faulkner just 
passed away recently.)

I did not know the name before, but I just read an "In Memoriam" article.  
Thank you, Roger.

> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, YARN-5121.06.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5121) fix some container-executor portability issues

2016-07-29 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400278#comment-15400278
 ] 

Chris Nauroth commented on YARN-5121:
-

Allen, sorry, I just spotted one more thing.  Would you please check for 
{{NULL}} returns from the {{malloc}} calls in {{__get_exec_readproc}} and the 
OS X implementation of {{get_executable}}?

> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, 
> YARN-5121.06.patch, YARN-5121.07.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute

2016-08-01 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402719#comment-15402719
 ] 

Chris Nauroth commented on YARN-5456:
-

[~aw], thank you for the patch.  I ran it on OS X, Linux and FreeBSD.  I think 
this will be ready to go after adding error checks on the {{malloc}} call and 
discussing a testing obstacle I'm hitting.

I'm running {{test-container-executor}}, and it passes everywhere except my 
FreeBSD VM.  In target/native-results/test-container-executor.stdout, I see 
this:

{code}
Testing delete_container()
Can't chmod /tmp/test-container-executor/local-1/usercache/cnauroth to add the 
sticky bit - Operation not permitted
Can't chmod /tmp/test-container-executor/local-2/usercache/cnauroth to add the 
sticky bit - Operation not permitted
Can't chmod /tmp/test-container-executor/local-3/usercache/cnauroth to add the 
sticky bit - Operation not permitted
Can't chmod /tmp/test-container-executor/local-4/usercache/cnauroth to add the 
sticky bit - Operation not permitted
Can't chmod /tmp/test-container-executor/local-5/usercache/cnauroth to add the 
sticky bit - Operation not permitted
FAIL: failed to initialize user cnauroth
{code}

That error comes from this code in container-executor.c:

{code}
int create_directory_for_user(const char* path) {
  // set 2750 permissions and group sticky bit
  mode_t permissions = S_IRWXU | S_IRGRP | S_IXGRP | S_ISGID;
...
  if (chmod(path, permissions) != 0) {
fprintf(LOGFILE, "Can't chmod %s to add the sticky bit - %s\n",
path, strerror(errno));
ret = -1;
{code}

I tried testing {{chmod}} to set the setgid bit, and sure enough it fails on 
FreeBSD.  I can set the setuid bit and the sticky bit.  The problem only 
happens for trying to set the setgid bit when I'm a non-root user.

{code}
> chmod 4750 /tmp/test-container-executor/local-1/usercache/cnauroth

> chmod 2750 /tmp/test-container-executor/local-1/usercache/cnauroth
chmod: /tmp/test-container-executor/local-1/usercache/cnauroth: Operation not 
permitted

> chmod 1750 /tmp/test-container-executor/local-1/usercache/cnauroth
{code}

I don't see this behavior on any other OS.  I assume it's some kind of 
environmental configuration quirk, but I haven't been able to find any tips in 
documentation.  Have you seen this?  Does the test pass for you on FreeBSD?

> container-executor support for FreeBSD, NetBSD, and others if conf path is 
> absolute
> ---
>
> Key: YARN-5456
> URL: https://issues.apache.org/jira/browse/YARN-5456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: YARN-5456.00.patch
>
>
> YARN-5121 changed how container-executor fixed quite a few portability 
> issues, but it also changed how it determines it's location to be very 
> operating specific for security reasons.  We should add support for FreeBSD 
> to unbreak it's ports entry, NetBSD (the sysctl options are just in a 
> different order), and for operating systems that do not have a defined 
> method, an escape hatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute

2016-08-01 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402789#comment-15402789
 ] 

Chris Nauroth commented on YARN-5456:
-

OK, this plan sounds fine to me.  I think the only additional thing we need 
here is the check on the {{malloc}} call.

> container-executor support for FreeBSD, NetBSD, and others if conf path is 
> absolute
> ---
>
> Key: YARN-5456
> URL: https://issues.apache.org/jira/browse/YARN-5456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: YARN-5456.00.patch
>
>
> YARN-5121 fixed quite a few portability issues, but it also changed how it 
> determines it's location to be very operating specific for security reasons.  
> We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the 
> sysctl options are just in a different order), and for operating systems that 
> do not have a defined method, an escape hatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute

2016-08-02 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404892#comment-15404892
 ] 

Chris Nauroth commented on YARN-5456:
-

[~aw], patch 01 looks good.  I verified this on OS X, Linux and FreeBSD.  It's 
cool to see the test passing on FreeBSD this time around!  My only other 
suggestion is to try deploying this change in a secured cluster for a bit of 
manual testing before we commit.

> container-executor support for FreeBSD, NetBSD, and others if conf path is 
> absolute
> ---
>
> Key: YARN-5456
> URL: https://issues.apache.org/jira/browse/YARN-5456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: YARN-5456.00.patch, YARN-5456.01.patch
>
>
> YARN-5121 fixed quite a few portability issues, but it also changed how it 
> determines it's location to be very operating specific for security reasons.  
> We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the 
> sysctl options are just in a different order), and for operating systems that 
> do not have a defined method, an escape hatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5838) windows - environement variables aren't accessible on Yarn 3.0 alpha-1

2016-12-30 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788415#comment-15788415
 ] 

Chris Nauroth commented on YARN-5838:
-

Hello [~rekha.du...@gmail.com].  Are you possibly looking for the 
{{yarn.nodemanager.admin-env}} setting in yarn-site.xml?  Here is a copy-paste 
of the default as defined in yarn-default.xml:

{code}
  
Environment variables that should be forwarded from the 
NodeManager's environment to the container's.
yarn.nodemanager.admin-env
MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
  
{code}


> windows - environement variables aren't accessible on Yarn 3.0 alpha-1
> --
>
> Key: YARN-5838
> URL: https://issues.apache.org/jira/browse/YARN-5838
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
> Environment: windows 7
>Reporter: Kanthirekha
>
> windows environment variables aren't accessible on Yarn 3.0 alpha-1
> tried fetching %Path% from Application master and on the container script 
> (after a container is allocated by application master for task executions)
> echo %Path%  
> result : is echo is on 
> it returns blank . 
> Could you please let us know what are the necessary steps to access env 
> variables from yarn 3.0 aplha1 version ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-25 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437095#comment-15437095
 ] 

Chris Nauroth commented on YARN-5551:
-

My understanding agrees with Jason's last comment.  The mapping could last well 
past the deletion of the underlying file, maybe even for the whole lifetime of 
the process, so it's correct to include it in the accounting.

> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7f6123f99000-7f6163f99000 rw-p  08:41 211419477  
> /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache
>  (deleted)
> Size:1048576 kB
> Rss:  637292 kB
> Pss:  637292 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:637292 kB
> Referenced:   637292 kB
> Anonymous:637292 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-25 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437256#comment-15437256
 ] 

Chris Nauroth commented on YARN-5551:
-

OK, I get it now.  Thanks, [~gopalv].  I'd be fine proceeding with the change.  
I'm not online until after Labor Day, so I can't do a full code review, test 
and commit.  If anyone else wants to do it, please don't wait for me.

> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7f6123f99000-7f6163f99000 rw-p  08:41 211419477  
> /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache
>  (deleted)
> Size:1048576 kB
> Rss:  637292 kB
> Pss:  637292 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:637292 kB
> Referenced:   637292 kB
> Anonymous:637292 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2016-09-29 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534056#comment-15534056
 ] 

Chris Nauroth commented on YARN-4205:
-

I think this patch broke compilation on branch-2.

{code}
  private Map monitoredApps =
  new HashMap();
{code}

{code}
monitoredApps.putIfAbsent(appToMonitor, timeout);
{code}

[{{Map#putIfAbsent}}|https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#putIfAbsent-K-V-]
 was added in JDK 1.8, but we want to be able to compile branch-2 for JDK 1.7.

Can someone please take a look?

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: Rohith Sharma K S
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, 
> 0003-YARN-4205.patch, 0004-YARN-4205.patch, 0005-YARN-4205.patch, 
> 0006-YARN-4205.patch, 0007-YARN-4205.1.patch, 0007-YARN-4205.2.patch, 
> 0007-YARN-4205.patch, YARN-4205_01.patch, YARN-4205_02.patch, 
> YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2016-09-29 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534232#comment-15534232
 ] 

Chris Nauroth commented on YARN-4205:
-

+1 for the addendum patch.  [~gtCarrera9], thank you.

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: Rohith Sharma K S
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, 
> 0003-YARN-4205.patch, 0004-YARN-4205.patch, 0005-YARN-4205.patch, 
> 0006-YARN-4205.patch, 0007-YARN-4205.1.patch, 0007-YARN-4205.2.patch, 
> 0007-YARN-4205.patch, YARN-4205-addendum.001.patch, YARN-4205_01.patch, 
> YARN-4205_02.patch, YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2016-09-29 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534277#comment-15534277
 ] 

Chris Nauroth commented on YARN-4205:
-

[~gtCarrera9], we are all clear to use JDK 8 features in trunk, so I think 
committing only to branch-2 is fine.

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: Rohith Sharma K S
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, 
> 0003-YARN-4205.patch, 0004-YARN-4205.patch, 0005-YARN-4205.patch, 
> 0006-YARN-4205.patch, 0007-YARN-4205.1.patch, 0007-YARN-4205.2.patch, 
> 0007-YARN-4205.patch, YARN-4205-addendum.001.patch, YARN-4205_01.patch, 
> YARN-4205_02.patch, YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2016-09-30 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3514:

Assignee: (was: Chris Nauroth)

I'm not actively working on this, so I'm unassigning.

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3962) If we change node manager identity to run as virtual account, then resource localization service fails to start with incorrect permission

2015-08-06 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661004#comment-14661004
 ] 

Chris Nauroth commented on YARN-3962:
-

This looks good to me.  I agree with Xuan that it would be good to find a way 
to add unit tests.  Thank you, Madhumita!

> If we change node manager identity to run as virtual account, then resource 
> localization service fails to start with incorrect permission
> -
>
> Key: YARN-3962
> URL: https://issues.apache.org/jira/browse/YARN-3962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: madhumita chakraborty
> Attachments: YARN-3962-002.patch, Yarn-3962.001.patch
>
>
> For azure hdinsight we need to change node manager to run as virtual account 
> instead of use account. Else after azure reimage, it wont be able to access 
> the map output data of the running job in that node. But when we changed the 
> nodemanager to run as virtual account we got this error, 
>  2015-06-02 06:11:45,281 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Writing credentials to the nmPrivate file 
> c:/apps1/temp/hdfs/nm-local-dir/nmPrivate/container_1433128260970_0007_01_01.tokens.
>  Credentials list: 
>  2015-06-02 06:11:45,313 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Permissions incorrectly set for dir 
> c:/apps1/temp/hdfs/nm-local-dir/usercache, should be rwxr-xr-x, actual value 
> = rwxrwxr-x
>  2015-06-02 06:11:45,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Attempting to initialize c:/apps1/temp/hdfs/nm-local-dir
>  2015-06-02 06:11:45,375 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Permissions incorrectly set for dir 
> c:/apps1/temp/hdfs/nm-local-dir/usercache, should be rwxr-xr-x, actual value 
> = rwxrwxr-x
>  2015-06-02 06:11:45,375 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to setup local dir c:/apps1/temp/hdfs/nm-local-dir, which was marked 
> as good.
>  org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Permissions 
> incorrectly set for dir c:/apps1/temp/hdfs/nm-local-dir/usercache, should be 
> rwxr-xr-x, actual value = rwxrwxr-x
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1400)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1367)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$900(ResourceLocalizationService.java:137)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1085)
>  2015-06-02 06:11:45,375 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
>  org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup 
> local dir c:/apps1/temp/hdfs/nm-local-dir, which was marked as good.
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1372)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$900(ResourceLocalizationService.java:137)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1085)
>  Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> Permissions incorrectly set for dir 
> c:/apps1/temp/hdfs/nm-local-dir/usercache, should be rwxr-xr-x, actual value 
> = rwxrwxr-x
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1400)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1367)
> Fix - When node manager runs as virtual account, the resourcelocalization 
> service fails to come. It checks for the permission of usercache and file 
> cache to be 755 and nmPrivate to be 700. But in windows, for virtual account, 
> the owner and group is same. So this pemrission check fails.

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-11 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358146#comment-14358146
 ] 

Chris Nauroth commented on YARN-3336:
-

Hi [~zxu].  Nice catch.

I think the current version of the patch would change the owner of all obtained 
delegation tokens from the application submitter to the user running the 
ResourceManager daemon (i.e. the "yarn" user).  Instead, can we simply call 
{{close}} on the {{FileSystem}} after {{addDelegationTokens}}?  Closing a 
{{FileSystem}} also has the effect of removing it from the cache.  Since we 
already know that a new instance is getting created every time through this 
code path, I don't think closing the instance can impact any other threads.

> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3336.000.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;
>   }
> {code}
> UserGroupInformation.equals will compare subject by reference.
> {code}
>   public boolean equals(Object o) {
> if (o == this) {
>   return true;
> } else if (o == null || getClass() != o.getClass()) {
>   return false;
> } else {
>   return subject == ((UserGroupInformation) o).subject;
> }
>   }
> {code}
> So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
> are called, a new FileSystem will be created and a new entry will be added to 
> FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-20 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372127#comment-14372127
 ] 

Chris Nauroth commented on YARN-3336:
-

Hello, [~zxu].  Thank you for providing the new patch and adding the test.

I think we can avoid the changes in {{FileSystem}} by adding an instance 
counter to {{MyFS}}.  We can increment it in the constructor and decrement it 
in {{close}}.  Then, the test can get the value of the counter before making 
the calls to {{obtainSystemTokensForUser}} and assert that the counter has the 
same value after those calls.

> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
> YARN-3336.002.patch, YARN-3336.003.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;
>   }
> {code}
> UserGroupInformation.equals will compare subject by reference.
> {code}
>   public boolean equals(Object o) {
> if (o == this) {
>   return true;
> } else if (o == null || getClass() != o.getClass()) {
>   return false;
> } else {
>   return subject == ((UserGroupInformation) o).subject;
> }
>   }
> {code}
> So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
> are called, a new FileSystem will be created and a new entry will be added to 
> FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-20 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3336:

Target Version/s: 2.7.0
Hadoop Flags: Reviewed

+1 for patch v004 pending Jenkins.

> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
> YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;
>   }
> {code}
> UserGroupInformation.equals will compare subject by reference.
> {code}
>   public boolean equals(Object o) {
> if (o == this) {
>   return true;
> } else if (o == null || getClass() != o.getClass()) {
>   return false;
> } else {
>   return subject == ((UserGroupInformation) o).subject;
> }
>   }
> {code}
> So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
> are called, a new FileSystem will be created and a new entry will be added to 
> FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-23 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376914#comment-14376914
 ] 

Chris Nauroth commented on YARN-3336:
-

[~zxu], I apologize, but I missed entering your name on the git commit message:

{code}
commit 6ca1f12024fd7cec7b01df0f039ca59f3f365dc1
Author: cnauroth 
Date:   Mon Mar 23 10:45:50 2015 -0700

YARN-3336. FileSystem memory leak in DelegationTokenRenewer.
{code}

Unfortunately, this isn't something we can change, because it could mess up the 
git history.

You're still there in CHANGES.txt though, so you get the proper credit for the 
patch:

{code}
YARN-3336. FileSystem memory leak in DelegationTokenRenewer.
(Zhihai Xu via cnauroth)
{code}


> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
> YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;
>   }
> {code}
> UserGroupInformation.equals will compare subject by reference.
> {code}
>   public boolean equals(Object o) {
> if (o == this) {
>   return true;
> } else if (o == null || getClass() != o.getClass()) {
>   return false;
> } else {
>   return subject == ((UserGroupInformation) o).subject;
> }
>   }
> {code}
> So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
> are called, a new FileSystem will be created and a new entry will be added to 
> FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-04-20 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503666#comment-14503666
 ] 

Chris Nauroth commented on YARN-3514:
-

[~john.lil...@redpoint.net], thank you for the detailed bug report.

I believe the root cause is likely to be in container localization's URI 
parsing to construct the local download path.  The relevant code is in 
{{ContainerLocalizer#download}}:

{code}
  Callable download(Path path, LocalResource rsrc,
  UserGroupInformation ugi) throws IOException {
DiskChecker.checkDir(new File(path.toUri().getRawPath()));
return new FSDownload(lfs, ugi, conf, path, rsrc);
  }
{code}

We're taking a {{Path}} and converting it to URI form, but I don't think 
{{getRawPath}} is the correct call for us to access the path portion of the 
URI.  A possible fix would be to switch to {{getPath}}, which would actually 
decode back to the original form.

{code}
scala> new org.apache.hadoop.fs.Path("domain\\hadoopuser").toUri().getRawPath()
new org.apache.hadoop.fs.Path("domain\\hadoopuser").toUri().getRawPath()
res4: java.lang.String = domain%5Chadoopuser

scala> new org.apache.hadoop.fs.Path("domain\\hadoopuser").toUri().getPath()
new org.apache.hadoop.fs.Path("domain\\hadoopuser").toUri().getPath()
res5: java.lang.String = domain\hadoopuser
{code}


> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Priority: Minor
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This messa

[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-04-20 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3514:

Attachment: YARN-3514.001.patch

I'm attaching a patch with the fix I described in my last comment.  I added a 
test that passes a file name containing a '\' character through localization.  
With the existing code using {{URI#getRawPath}}, the test fails as shown below. 
 (Note the incorrect URI-encoded path, similar to the reported symptom in the 
description.)  After switching to {{URI#getPath}}, the test passes as expected.

{code}
Failed tests: 
  TestContainerLocalizer.testLocalizerDiskCheckDoesNotUriEncodePath:265 
Argument(s) are different! Wanted:
containerLocalizer.checkDir(/my\File);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testLocalizerDiskCheckDoesNotUriEncodePath(TestContainerLocalizer.java:265)
Actual invocation has different arguments:
containerLocalizer.checkDir(/my%5CFile);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testLocalizerDiskCheckDoesNotUriEncodePath(TestContainerLocalizer.java:264)
{code}


> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Priority: Minor
> Attachments: YARN-3514.001.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-04-20 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3514:

 Component/s: (was: yarn)
  nodemanager
Target Version/s: 2.8.0
Assignee: Chris Nauroth

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-3514.001.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-04-21 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3514:

Attachment: YARN-3514.002.patch

In the first patch, the new test passed for me locally but failed on Jenkins.  
I think this is because I was using a hard-coded destination path for the 
localized resource, and this might have caused a permissions violation on the 
Jenkins host.  Here is patch v002.  I changed the test so that the localized 
resource is relative to the user's filecache, which is in the proper test 
working directory.  I also added a second test to make sure that we don't 
accidentally URI-decode anything.

bq. I am very impressed with the short time it took to patch.

Thanks!  Before we declare victory though, can you check that your local file 
system allows the '\' character in file and directory names?  The patch here 
definitely fixes a bug, but testing the '\' character on your local file system 
will tell us whether or not the whole problem is resolved for your deployment.  
Even better would be if you have the capability to test with my patch applied.


> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-04-21 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505711#comment-14505711
 ] 

Chris Nauroth commented on YARN-3514:
-

[~john.lil...@redpoint.net], thank you for the confirmation.

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3549) use JNI-based FileStatus implementation from io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation from RawLocalFileSystem in checkLocalDir.

2015-04-27 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514842#comment-14514842
 ] 

Chris Nauroth commented on YARN-3549:
-

Hi [~zxu].  Since this is a proposal to call native code, I'd like to make sure 
test suites are passing on both Linux and Windows when it's ready.  The method 
is implemented for both Linux and Windows, so I do expect it would work fine, 
but I'd like to make sure.  If you don't have access to a Windows VM for 
testing, I'd be happy to volunteer to test on Windows for you when a patch is 
ready.  Thanks!

> use JNI-based FileStatus implementation from 
> io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation 
> from RawLocalFileSystem in checkLocalDir.
> 
>
> Key: YARN-3549
> URL: https://issues.apache.org/jira/browse/YARN-3549
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> Use JNI-based FileStatus implementation from 
> io.nativeio.NativeIO.POSIX#getFstat instead of shell-based implementation 
> from RawLocalFileSystem in checkLocalDir.
> As discussed in YARN-3491, shell-based implementation getPermission runs 
> shell command "ls -ld" to get permission, which take 4 or 5 ms(very slow).
> We should switch to io.nativeio.NativeIO.POSIX#getFstat as implementation in 
> RawLocalFileSystem to get rid of shell-based implementation for FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3524) Mapreduce failed due to AM Container-Launch failure at NM on windows

2015-04-27 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved YARN-3524.
-
Resolution: Not A Problem

Hello [~KaveenBigdata].  Nice debugging!

The native components for Hadoop on Windows are built using either Windows SDK 
7.1 or Visual Studio 2010.  Because of this, there is a runtime dependency on 
the C++ 2010 runtime dll, which is MSVCR100.dll.  You are correct that the fix 
in this case is to install the missing dll.  I believe this is the official 
download location:

https://www.microsoft.com/en-us/download/details.aspx?id=13523

Since this does not represent a bug in the Hadoop codebase, I'm resolving this 
issue as Not a Problem.

> Mapreduce failed due to AM Container-Launch failure at NM on windows
> 
>
> Key: YARN-3524
> URL: https://issues.apache.org/jira/browse/YARN-3524
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.2
> Environment: Windows server 2012 and Windows-8
> Hadoop-2.5.2
> Java-1.7
>Reporter: Kaveen Raajan
>
> I tried to run TEZ job on windows machine 
> I successfully Build Tez-0.6.0 against Hadoop-2.5.2
> Then I configured Tez-0.6.0 as like in http://tez.apache.org/install.html
> But I face following error while running this command
> Note: I'm using HADOOP High Availability setup.
> {code}
> Running OrderedWordCount
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/C:/Hadoop/
> share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBind
> er.class]
> SLF4J: Found binding in [jar:file:/C:/Tez/lib
> /slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 15/04/15 10:47:57 INFO client.TezClient: Tez Client Version: [ 
> component=tez-api
> , version=0.6.0, revision=${buildNumber}, 
> SCM-URL=scm:git:https://git-wip-us.apa
> che.org/repos/asf/tez.git, buildTime=2015-04-15T01:13:02Z ]
> 15/04/15 10:48:00 INFO client.TezClient: Submitting DAG application with id: 
> app
> lication_1429073725727_0005
> 15/04/15 10:48:00 INFO Configuration.deprecation: fs.default.name is 
> deprecated.
>  Instead, use fs.defaultFS
> 15/04/15 10:48:00 INFO client.TezClientUtils: Using tez.lib.uris value from 
> conf
> iguration: hdfs://HACluster/apps/Tez/,hdfs://HACluster/apps/Tez/lib/
> 15/04/15 10:48:01 INFO client.TezClient: Stage directory /tmp/app/tez/sta
> ging doesn't exist and is created
> 15/04/15 10:48:01 INFO client.TezClient: Tez system stage directory 
> hdfs://HACluster
> /tmp/app/tez/staging/.tez/application_1429073725727_0005 doesn't ex
> ist and is created
> 15/04/15 10:48:02 INFO client.TezClient: Submitting DAG to YARN, 
> applicationId=a
> pplication_1429073725727_0005, dagName=OrderedWordCount
> 15/04/15 10:48:03 INFO impl.YarnClientImpl: Submitted application 
> application_14
> 29073725727_0005
> 15/04/15 10:48:03 INFO client.TezClient: The url to track the Tez AM: 
> http://MASTER_NN1:8088/proxy/application_1429073725727_0005/
> 15/04/15 10:48:03 INFO client.DAGClientImpl: Waiting for DAG to start running
> 15/04/15 10:48:09 INFO client.DAGClientImpl: DAG completed. FinalState=FAILED
> OrderedWordCount failed with diagnostics: [Application 
> application_1429073725727
> _0005 failed 2 times due to AM Container for 
> appattempt_1429073725727_0005_0
> 2 exited with  exitCode: -1073741515 due to: Exception from container-launch: 
> Ex
> itCodeException exitCode=-1073741515:
> ExitCodeException exitCode=-1073741515:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
> 702)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
> unchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
> ontainerLaunch.call(ContainerLaunch.java:300)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
> ontainerLaunch.call(ContainerLaunch.java:81)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:615)
> at java.lang.Thread.run(Thread.java:744)
> 1 file(s) moved.
> Container exited with a non-zero exit code -1073741515
> .Failing this attempt.. Failing the application.]
> {code}
> While Seeing at Resourcemanager log:
> {code}
> 2015-04-19 21:49:57,533 INFO 
> org

[jira] [Updated] (YARN-2549) TestContainerLaunch fails due to classpath problem with hamcrest classes.

2014-09-15 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2549:

Hadoop Flags: Reviewed

Thank you, Arpit.  I committed this to trunk and branch-2.

> TestContainerLaunch fails due to classpath problem with hamcrest classes.
> -
>
> Key: YARN-2549
> URL: https://issues.apache.org/jira/browse/YARN-2549
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager, test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-2549.1.patch
>
>
> The mockito jar bundles its own copy of the hamcrest classes, and it's ahead 
> of our hamcrest dependency jar on the test classpath for 
> hadoop-yarn-server-nodemanager.  Unfortunately, the version bundled in 
> mockito doesn't match the version we need, so it's missing the 
> {{CoreMatchers#containsString}} method.  This causes the tests to fail with 
> {{NoSuchMethodError}} on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2662) TestCgroupsLCEResourcesHandler leaks file descriptors.

2014-10-08 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2662:

Attachment: YARN-2662.1.patch

Here is a small patch with the fix.  The bug had been causing test failures on 
Windows.  With this patch, the test now passes on Windows.

> TestCgroupsLCEResourcesHandler leaks file descriptors.
> --
>
> Key: YARN-2662
> URL: https://issues.apache.org/jira/browse/YARN-2662
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-2662.1.patch
>
>
> {{TestCgroupsLCEResourcesHandler}} includes tests that write and read values 
> from the various cgroups files.  After the tests read from a file, they do 
> not close it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2662) TestCgroupsLCEResourcesHandler leaks file descriptors.

2014-10-08 Thread Chris Nauroth (JIRA)

Chris Nauroth created YARN-2662:
---

 Summary: TestCgroupsLCEResourcesHandler leaks file descriptors.
 Key: YARN-2662
 URL: https://issues.apache.org/jira/browse/YARN-2662
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestCgroupsLCEResourcesHandler}} includes tests that write and read values 
from the various cgroups files.  After the tests read from a file, they do not 
close it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2662) TestCgroupsLCEResourcesHandler leaks file descriptors.

2014-10-08 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164326#comment-14164326
 ] 

Chris Nauroth commented on YARN-2662:
-

The release audit warning is unrelated.

> TestCgroupsLCEResourcesHandler leaks file descriptors.
> --
>
> Key: YARN-2662
> URL: https://issues.apache.org/jira/browse/YARN-2662
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-2662.1.patch
>
>
> {{TestCgroupsLCEResourcesHandler}} includes tests that write and read values 
> from the various cgroups files.  After the tests read from a file, they do 
> not close it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2668) yarn-registry JAR won't link against ZK 3.4.5

2014-10-09 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2668:

Hadoop Flags: Reviewed

+1 for the patch, pending Jenkins.  Thanks, Steve.

> yarn-registry JAR won't link against ZK 3.4.5
> -
>
> Key: YARN-2668
> URL: https://issues.apache.org/jira/browse/YARN-2668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2668-001.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> It's been reported that the registry code doesn't link against ZK 3.4.5 as 
> the enable/disable SASL client property isn't there, which went in with 
> ZOOKEEPER-1657.
> pulling in the constant and {{isEnabled()}} check will ensure registry 
> linkage, even though the ability for a client to disable SASL auth will be 
> lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2668) yarn-registry JAR won't link against ZK 3.4.5

2014-10-10 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167333#comment-14167333
 ] 

Chris Nauroth commented on YARN-2668:
-

Thanks for catching that, Steve.  +1 for patch v2 pending jenkins.

> yarn-registry JAR won't link against ZK 3.4.5
> -
>
> Key: YARN-2668
> URL: https://issues.apache.org/jira/browse/YARN-2668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2668-001.patch, YARN-2668-002.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> It's been reported that the registry code doesn't link against ZK 3.4.5 as 
> the enable/disable SASL client property isn't there, which went in with 
> ZOOKEEPER-1657.
> pulling in the constant and {{isEnabled()}} check will ensure registry 
> linkage, even though the ability for a client to disable SASL auth will be 
> lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2689) TestSecureRMRegistryOperations failing on windows

2014-10-15 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172629#comment-14172629
 ] 

Chris Nauroth commented on YARN-2689:
-

Hi Steve.  Thanks for fixing this.  The patch looks good to me, and I verified 
that it fixes the tests on Windows.  However, I'm seeing a problem when running 
on Mac and Linux.  It's hanging while executing {{ktutil}}.  On my systems, 
{{ktutil}} is an interactive command, so the tests are starting up the child 
process, and then it's never exiting.  (See stack trace below.)  Some quick 
searching indicates that some installations of {{ktutil}} are non-interactive, 
but others are entirely interactive (MIT for example).

{code}
"JUnit" prio=10 tid=0x7f424488c000 nid=0x675 runnable [0x7f4239c02000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:236)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0xed550918> (a java.io.BufferedInputStream)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
- locked <0xed3df918> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:153)
at java.io.BufferedReader.read1(BufferedReader.java:204)
at java.io.BufferedReader.read(BufferedReader.java:278)
- locked <0xed3df918> (a java.io.InputStreamReader)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:721)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:530)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:708)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:797)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:780)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


> TestSecureRMRegistryOperations failing on windows
> -
>
> Key: YARN-2689
> URL: https://issues.apache.org/jira/browse/YARN-2689
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Affects Versions: 2.6.0
> Environment: Windows server, Java 7, ZK 3.4.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2689-001.patch
>
>
> the micro ZK service used in the {{TestSecureRMRegistryOperations}} test 
> doesnt start on windows, 
> {code}
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Could 
> not configure server because SASL configuration did not allow the  ZooKeeper 
> server to authenticate itself properly: 
> javax.security.auth.login.LoginException: Unable to obtain password from user
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2689) TestSecureRMRegistryOperations failing on windows

2014-10-15 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172676#comment-14172676
 ] 

Chris Nauroth commented on YARN-2689:
-

+1 for the patch.  Thanks again, Steve.

> TestSecureRMRegistryOperations failing on windows
> -
>
> Key: YARN-2689
> URL: https://issues.apache.org/jira/browse/YARN-2689
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Affects Versions: 2.6.0
> Environment: Windows server, Java 7, ZK 3.4.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2689-001.patch
>
>
> the micro ZK service used in the {{TestSecureRMRegistryOperations}} test 
> doesnt start on windows, 
> {code}
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Could 
> not configure server because SASL configuration did not allow the  ZooKeeper 
> server to authenticate itself properly: 
> javax.security.auth.login.LoginException: Unable to obtain password from user
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2692) ktutil test hanging on some machines/ktutil versions

2014-10-20 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2692:

Hadoop Flags: Reviewed

+1 for the patch.  I agree that we're not really losing any test coverage by 
removing this.  {{TestSecureRegistry}} will make use of the same keytab file 
implicitly.

> ktutil test hanging on some machines/ktutil versions
> 
>
> Key: YARN-2692
> URL: https://issues.apache.org/jira/browse/YARN-2692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2692-001.patch
>
>
> a couple of the registry security tests run native {{ktutil}}; this is 
> primarily to debug the keytab generation. [~cnauroth] reports that some 
> versions of {{kinit}} hang. Fix: rm the tests. [YARN-2689]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2720:

 Component/s: nodemanager
Target Version/s: 2.6.0

> Windows: Wildcard classpath variables not expanded against resources 
> contained in archives
> --
>
> Key: YARN-2720
> URL: https://issues.apache.org/jira/browse/YARN-2720
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Craig Welch
>Assignee: Craig Welch
>
> On windows there are limitations to the length of command lines and 
> environment variables which prevent placing all classpath resources into 
> these elements.  Instead, a jar containing only a classpath manifest is 
> created to provide the classpath.  During this process wildcard references 
> are expanded by inspecting the filesystem.  Since archives are extracted to a 
> different location and linked into the final location after the classpath jar 
> is created, resources referred to via wildcards which exist in localized 
> archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
> these entries are removed from the final classpath for the container they are 
> not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2720:

Hadoop Flags: Reviewed

+1 for the patch, pending Jenkins run.  I've verified that this works in my 
environment with a few test runs.  Thank you for fixing this, Craig.

> Windows: Wildcard classpath variables not expanded against resources 
> contained in archives
> --
>
> Key: YARN-2720
> URL: https://issues.apache.org/jira/browse/YARN-2720
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-2720.2.patch, YARN-2720.3.patch, YARN-2720.4.patch
>
>
> On windows there are limitations to the length of command lines and 
> environment variables which prevent placing all classpath resources into 
> these elements.  Instead, a jar containing only a classpath manifest is 
> created to provide the classpath.  During this process wildcard references 
> are expanded by inspecting the filesystem.  Since archives are extracted to a 
> different location and linked into the final location after the classpath jar 
> is created, resources referred to via wildcards which exist in localized 
> archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
> these entries are removed from the final classpath for the container they are 
> not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2720) Windows: Wildcard classpath variables not expanded against resources contained in archives

2014-10-21 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178897#comment-14178897
 ] 

Chris Nauroth commented on YARN-2720:
-

The Findbugs warnings are unrelated.  I'll commit this.

> Windows: Wildcard classpath variables not expanded against resources 
> contained in archives
> --
>
> Key: YARN-2720
> URL: https://issues.apache.org/jira/browse/YARN-2720
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-2720.2.patch, YARN-2720.3.patch, YARN-2720.4.patch
>
>
> On windows there are limitations to the length of command lines and 
> environment variables which prevent placing all classpath resources into 
> these elements.  Instead, a jar containing only a classpath manifest is 
> created to provide the classpath.  During this process wildcard references 
> are expanded by inspecting the filesystem.  Since archives are extracted to a 
> different location and linked into the final location after the classpath jar 
> is created, resources referred to via wildcards which exist in localized 
> archives  (.zip, tar.gz) are not added to the classpath manifest jar.  Since 
> these entries are removed from the final classpath for the container they are 
> not on the container's classpath as they should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2700) TestSecureRMRegistryOperations failing on windows: auth problems

2014-10-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2700:

Hadoop Flags: Reviewed

+1 for the patch, pending Jenkins.  Thanks for the fix, Steve.

> TestSecureRMRegistryOperations failing on windows: auth problems
> 
>
> Key: YARN-2700
> URL: https://issues.apache.org/jira/browse/YARN-2700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Affects Versions: 2.6.0
> Environment: Windows Server, Win7
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2700-001.patch
>
>
> TestSecureRMRegistryOperations failing on windows: unable to create the root 
> /registry path with permissions problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2700) TestSecureRMRegistryOperations failing on windows: auth problems

2014-10-22 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180086#comment-14180086
 ] 

Chris Nauroth commented on YARN-2700:
-

bq. ...pending Jenkins...

Never mind.  It looks like Jenkins and I had a race condition commenting.  :-)  
You have a full +1 from me now.

> TestSecureRMRegistryOperations failing on windows: auth problems
> 
>
> Key: YARN-2700
> URL: https://issues.apache.org/jira/browse/YARN-2700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Affects Versions: 2.6.0
> Environment: Windows Server, Win7
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2700-001.patch
>
>
> TestSecureRMRegistryOperations failing on windows: unable to create the root 
> /registry path with permissions problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2677) registry punycoding of usernames doesn't fix all usernames to be DNS-valid

2014-10-30 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2677:

Hadoop Flags: Reviewed

+1 for the patch.  I verified this on both Mac and Windows.  Thanks, Steve!

> registry punycoding of usernames doesn't fix all usernames to be DNS-valid
> --
>
> Key: YARN-2677
> URL: https://issues.apache.org/jira/browse/YARN-2677
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2677-001.patch, YARN-2677-002.patch
>
>
> The registry has a restriction "DNS-valid names only" to retain the future 
> option of DNS exporting of the registry.
> to handle complex usernames, it punycodes the username first, using Java's 
> {{java.net.IDN}} class.
> This turns out to only map high unicode-> ASCII, and does nothing for 
> ascii-but-invalid-hostname chars, so stopping users with DNS-illegal names 
> (e.g. with an underscore in them) from being able to register



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.

2014-11-03 Thread Chris Nauroth (JIRA)

Chris Nauroth created YARN-2803:
---

 Summary: MR distributed cache not working correctly on Windows 
after NodeManager privileged account changes.
 Key: YARN-2803
 URL: https://issues.apache.org/jira/browse/YARN-2803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Priority: Critical


This problem is visible by running {{TestMRJobs#testDistributedCache}} or 
{{TestUberAM#testDistributedCache}} on Windows.  Both tests fail.  Running git 
bisect, I traced it to the YARN-2198 patch to remove the need to run 
NodeManager as a privileged account.  The tests started failing when that patch 
was committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.

2014-11-03 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195451#comment-14195451
 ] 

Chris Nauroth commented on YARN-2803:
-

Here is the stack trace from a failure.

{code}
testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time elapsed: 1
6.844 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs._testDistributedCache(TestMRJobs.java:881)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testDistributedCache(TestMRJobs.java:891)
{code}

The task log shows the assertion failing when it tries to find 
job.jar/lib/lib2.jar.

{code}
2014-11-03 15:36:33,652 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:621)
at org.junit.Assert.assertNotNull(Assert.java:631)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs$DistributedCacheChecker.setup(TestMRJobs.java:764)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:169)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1640)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
{code}


> MR distributed cache not working correctly on Windows after NodeManager 
> privileged account changes.
> ---
>
> Key: YARN-2803
> URL: https://issues.apache.org/jira/browse/YARN-2803
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Nauroth
>Priority: Critical
>
> This problem is visible by running {{TestMRJobs#testDistributedCache}} or 
> {{TestUberAM#testDistributedCache}} on Windows.  Both tests fail.  Running 
> git bisect, I traced it to the YARN-2198 patch to remove the need to run 
> NodeManager as a privileged account.  The tests started failing when that 
> patch was committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-11-03 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195453#comment-14195453
 ] 

Chris Nauroth commented on YARN-2198:
-

It appears that this patch has broken some MR distributed cache functionality 
on Windows, or at least caused a failure in 
{{TestMRJobs#testDistributedCache}}.  Please see YARN-2803 for more details.

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> --
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
> YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
> YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.16.patch, 
> YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, 
> YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, 
> YARN-2198.separation.patch, YARN-2198.trunk.10.patch, 
> YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, 
> YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires the process launching the container to be LocalSystem or a 
> member of the a local Administrators group. Since the process in question is 
> the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.

2014-11-06 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2803:

Assignee: Craig Welch
Hadoop Flags: Reviewed

+1 for the patch.  I verified that {{TestMRJobs}} and {{TestUberAM}} pass in my 
Windows environment.

I'm going to hold off on committing until tomorrow in case anyone else watching 
wants to comment regarding secure mode.  I do think we need to commit this, 
because without it, we have a regression in non-secure mode on Windows, which 
has been shipping for several releases already.  Secure mode is still under 
development as I understand it.

> MR distributed cache not working correctly on Windows after NodeManager 
> privileged account changes.
> ---
>
> Key: YARN-2803
> URL: https://issues.apache.org/jira/browse/YARN-2803
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Nauroth
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2803.0.patch
>
>
> This problem is visible by running {{TestMRJobs#testDistributedCache}} or 
> {{TestUberAM#testDistributedCache}} on Windows.  Both tests fail.  Running 
> git bisect, I traced it to the YARN-2198 patch to remove the need to run 
> NodeManager as a privileged account.  The tests started failing when that 
> patch was committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-11-06 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201674#comment-14201674
 ] 

Chris Nauroth commented on YARN-2198:
-

This patch caused {{TestWinUtils#testChmod}} to fail.  I submitted a patch on 
HADOOP-11280 to fix the test.

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> --
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
> YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
> YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.16.patch, 
> YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, 
> YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, 
> YARN-2198.separation.patch, YARN-2198.trunk.10.patch, 
> YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, 
> YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires the process launching the container to be LocalSystem or a 
> member of the a local Administrators group. Since the process in question is 
> the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.

2014-11-07 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202571#comment-14202571
 ] 

Chris Nauroth commented on YARN-2803:
-

Thanks again, Craig.  I re-verified that the tests pass in my environment with 
this version of the patch.  I agree with the argument to retain the current 
behavior of secure mode (such as it is).

Sorry to nitpick, but it looks like some lines are indented by 1 space instead 
of 2.  Would you mind fixing that?  I'll be +1 after that.

{code}
if (exec instanceof WindowsSecureContainerExecutor) {
 jarDir = nmPrivateClasspathJarDir;
} else {
 jarDir = pwd; 
}
{code}


> MR distributed cache not working correctly on Windows after NodeManager 
> privileged account changes.
> ---
>
> Key: YARN-2803
> URL: https://issues.apache.org/jira/browse/YARN-2803
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Nauroth
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2803.0.patch, YARN-2803.1.patch
>
>
> This problem is visible by running {{TestMRJobs#testDistributedCache}} or 
> {{TestUberAM#testDistributedCache}} on Windows.  Both tests fail.  Running 
> git bisect, I traced it to the YARN-2198 patch to remove the need to run 
> NodeManager as a privileged account.  The tests started failing when that 
> patch was committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.

2014-11-07 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202602#comment-14202602
 ] 

Chris Nauroth commented on YARN-2803:
-

+1 for the v2 patch.  I'll commit this.

> MR distributed cache not working correctly on Windows after NodeManager 
> privileged account changes.
> ---
>
> Key: YARN-2803
> URL: https://issues.apache.org/jira/browse/YARN-2803
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Nauroth
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2803.0.patch, YARN-2803.1.patch, YARN-2803.2.patch
>
>
> This problem is visible by running {{TestMRJobs#testDistributedCache}} or 
> {{TestUberAM#testDistributedCache}} on Windows.  Both tests fail.  Running 
> git bisect, I traced it to the YARN-2198 patch to remove the need to run 
> NodeManager as a privileged account.  The tests started failing when that 
> patch was committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2013-05-14 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657684#comment-13657684
 ] 

Chris Nauroth commented on YARN-535:


+1 for the patch.  Thanks for making the change in {{TestDistributedShell}} 
too.  I verified the tests on both Mac and Windows.

> TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during 
> write phase, breaks later test runs
> 
>
> Key: YARN-535
> URL: https://issues.apache.org/jira/browse/YARN-535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 3.0.0
> Environment: OS/X laptop, HFS+ filesystem
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: YARN-535-02.patch, YARN-535.patch
>
>
> the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. 
> As {{Configuration.writeXml()}} does a reread of all resources, this will 
> break if the (open-for-writing) resource is already visible as an empty file. 
> This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks 
> later test runs -because it is not overwritten by later incremental builds, 
> due to timestamps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-700) TestInfoBlock fails on Windows because of line ending missmatch

2013-05-20 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662494#comment-13662494
 ] 

Chris Nauroth commented on YARN-700:


+1 for the patch.  I verified that the test passes on Mac and Windows.  Thank 
you, Ivan!

> TestInfoBlock fails on Windows because of line ending missmatch
> ---
>
> Key: YARN-700
> URL: https://issues.apache.org/jira/browse/YARN-700
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
> Attachments: YARN-700.patch
>
>
> Exception:
> {noformat}
> Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec <<< 
> FAILURE!
> testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  
> Time elapsed: 873 sec  <<< FAILURE!
> java.lang.AssertionError: 
>   at org.junit.Assert.fail(Assert.java:91)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertTrue(Assert.java:54)
>   at 
> org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-715) TestDistributedShell and TestUnmanagedAMLauncher are failing

2013-05-22 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664256#comment-13664256
 ] 

Chris Nauroth commented on YARN-715:


Is this a duplicate of YARN-699?

> TestDistributedShell and TestUnmanagedAMLauncher are failing
> 
>
> Key: YARN-715
> URL: https://issues.apache.org/jira/browse/YARN-715
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
>Assignee: Omkar Vinit Joshi
>
> Tests are timing out. Looks like this is related to YARN-617.
> {code}
> 2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] 
> containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to 
> start container.
> Expected containerId: user Found: container_1369183214008_0001_01_01
> 2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] 
> security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - 
> PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
> Expected containerId: user Found: container_1369183214008_0001_01_01
> 2013-05-21 17:40:23,695 INFO  [IPC Server handler 0 on 54024] ipc.Server 
> (Server.java:run(1864)) - IPC Server handler 0 on 54024, call 
> org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
> Expected containerId: user Found: container_1369183214008_0001_01_01
> org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request 
> to start container.
> Expected containerId: user Found: container_1369183214008_0001_01_01
>   at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
>   at 
> org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-715) TestDistributedShell and TestUnmanagedAMLauncher are failing

2013-05-22 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664360#comment-13664360
 ] 

Chris Nauroth commented on YARN-715:


{quote}
Seems like a bug in DSShell that does not handle failed container launches 
properly.
{quote}

Perhaps it's related to this comment on YARN-417:

https://issues.apache.org/jira/browse/YARN-417?focusedCommentId=13609801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13609801

{quote}
Prior to YARN-417, ApplicationMaster would check for being done at a regular 
interval. Now, using the AMRMClientAsync, it only checks on container 
completion, which never occurs because no containers are run.
{quote}

Perhaps because of only checking on container completion, if no container ever 
completes successfully, then the AM never knows to exit, and the process 
appears to hang.

> TestDistributedShell and TestUnmanagedAMLauncher are failing
> 
>
> Key: YARN-715
> URL: https://issues.apache.org/jira/browse/YARN-715
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
>Assignee: Omkar Vinit Joshi
>
> Tests are timing out. Looks like this is related to YARN-617.
> {code}
> 2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] 
> containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to 
> start container.
> Expected containerId: user Found: container_1369183214008_0001_01_01
> 2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] 
> security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - 
> PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
> Expected containerId: user Found: container_1369183214008_0001_01_01
> 2013-05-21 17:40:23,695 INFO  [IPC Server handler 0 on 54024] ipc.Server 
> (Server.java:run(1864)) - IPC Server handler 0 on 54024, call 
> org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
> Expected containerId: user Found: container_1369183214008_0001_01_01
> org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request 
> to start container.
> Expected containerId: user Found: container_1369183214008_0001_01_01
>   at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
>   at 
> org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-766) TestNodeManagerShutdown should use Shell to form the output path

2013-06-10 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679703#comment-13679703
 ] 

Chris Nauroth commented on YARN-766:


Hi Sid,

There are a couple of other minor differences between trunk and branch-2 for 
{{TestNodeManagerShutdown}}.  Would you mind including those in your patch too, 
just so the files are identical and easier to maintain between the 2 branches?  
Below is the full output I'm seeing from {{git diff trunk branch-2 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java}}
 .

Thank you!

{code}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apa
index e0db826..95c1c10 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/had
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/had
@@ -149,8 +149,8 @@ public void testKillContainersOnShutdown() throws 
IOException,
   }
 
   public static void startContainer(NodeManager nm, ContainerId cId,
-  FileContext localFS, File scriptFileDir, File processStartFile)
-  throws IOException, YarnException {
+  FileContext localFS, File scriptFileDir, File processStartFile) 
+  throws IOException, YarnException {
 File scriptFile =
 createUnhaltingScriptFile(cId, scriptFileDir, processStartFile);
 
@@ -158,7 +158,7 @@ public static void startContainer(NodeManager nm, 
ContainerId cId,
 recordFactory.newRecordInstance(ContainerLaunchContext.class);
 
 NodeId nodeId = BuilderUtils.newNodeId("localhost", 1234);
-
+ 
 URL localResourceUri =
 ConverterUtils.getYarnUrlFromPath(localFS
 .makeQualified(new Path(scriptFile.getAbsolutePath(;
@@ -235,7 +235,7 @@ private YarnConfiguration createNMConfig() {
*/
   private static File createUnhaltingScriptFile(ContainerId cId,
   File scriptFileDir, File processStartFile) throws IOException {
-File scriptFile = Shell.appendScriptExtension(scriptFileDir, "scriptFile");
+File scriptFile = new File(scriptFileDir, "scriptFile.sh");
 PrintWriter fileWriter = new PrintWriter(scriptFile);
 if (Shell.WINDOWS) {
   fileWriter.println("@echo \"Running testscript for delayed kill\"");
@@ -272,4 +272,4 @@ public void setMasterKey(MasterKey masterKey) {
   getNMContext().getContainerTokenSecretManager().setMasterKey(masterKey);
 }
   }
-}
\ No newline at end of file
+}
{code}


> TestNodeManagerShutdown should use Shell to form the output path
> 
>
> Key: YARN-766
> URL: https://issues.apache.org/jira/browse/YARN-766
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Siddharth Seth
>Priority: Minor
> Attachments: YARN-766.txt
>
>
> File scriptFile = new File(tmpDir, "scriptFile.sh");
> should be replaced with
> File scriptFile = Shell.appendScriptExtension(tmpDir, "scriptFile");
> to match trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 >

1 - 100 of 326 matches

Mail list logo