date:20141002

[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156127#comment-14156127
 ] 

Zhijie Shen commented on YARN-2468:
---

bq. I would like to check how many log files we can upload this time. If the 
number is 0, we can skip this time. And this check is also happened before 
LogKey.write(), otherwise, we will write key, but without value.

I think Vinod meant that pendingUploadFiles is needed, but doesn't need to the 
member variable. getPendingLogFilesToUploadForThisContainer can return this 
collection, and pass it into LogValue.write by adding one param of it.

2. IMHO, the following code can be improved. If we use iterator, we can delete 
the unnecessary element on the fly.
{code}
  for (File file : candidates) {
Matcher fileMatcher = filterPattern.matcher(file.getName());
if (fileMatcher.find()) {
  filteredFiles.add(file);
}
  }
  if (!exclusion) {
return filteredFiles;
  } else {
candidates.removeAll(filteredFiles);
return candidates;
  }
{code}
This block could be:
{code}
...
while(candidatesItr.hasNext()) {
  candidate = candidatesItr.next();
  ...
  if ((not match  inclusive) || (match  exclusive)) {
candidatesItr.remove()
  } 
}
{code}

3. [~jianhe] mentioned to me before that the following condition is not always 
true to determine an AM container. Any idea? And it seems that we don't need 
shouldUploadLogsForRunningContainer, we can re-use shouldUploadLogs and set 
wasContainerSuccessful to true. Personally, if it's not trivial to identify the 
AM container, I prefer to write a TODO comment and leave it until we implement 
the log retention API.
{code}
  if (containerId.getId() == 1) {
return true;
  }
{code}

bq. It seems to be, let's validate this via a test-case.

Is it addressed by
{code}
this.conf.setLong(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600);
{code}
Is it better to add a line of comment of the rationale behind the config?

5. Can the following code
{code}
SetContainerId finishedContainers = new HashSetContainerId();
for (ContainerId id : pendingContainerInThisCycle) {
  finishedContainers.add(id);
}
{code}
be simplified as
{code}
 SetContainerId finishedContainers = new 
HashSetContainerId(pendingContainerInThisCycle);
{code}

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
 YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
 YARN-2468.9.1.patch, YARN-2468.9.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Jun Gong (JIRA)

Jun Gong created YARN-2640:
--

 Summary: TestDirectoryCollection.testCreateDirectories failed
 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong


When running test mvn test -Dtest=TestDirectoryCollection, it failed:
{code}
Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
  Time elapsed: 0.969 sec   FAILURE!
java.lang.AssertionError: local dir parent not created with proper permissions 
expected:rwxr-xr-x but was:rwxrwxr-x
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
{code}

I found it was because testDiskSpaceUtilizationLimit ran before 
testCreateDirectories when running test, then directory dirA was created in 
test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
to create dirA with specified permission, it found dirA has already been 
there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-02 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156192#comment-14156192
]

Junping Du commented on YARN-1972:
--

Hi [~vinodkv], I think we should commit this patch to branch-2.6 given this
JIRA is marked as fixed in 2.6.

Implement secure Windows Container Executor
---

Key: YARN-1972
URL: https://issues.apache.org/jira/browse/YARN-1972
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Labels: security, windows
Fix For: 2.6.0

Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch,
YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch,
YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch

h1. Windows Secure Container Executor (WCE)
YARN-1063 adds the necessary infrasturcture to launch a process as a domain
user as a solution for the problem of having a security boundary between
processes executed in YARN containers and the Hadoop services. The WCE is a
container executor that leverages the winutils capabilities introduced in
YARN-1063 and launches containers as an OS process running as the job
submitter user. A description of the S4U infrastructure used by YARN-1063
alternatives considered can be read on that JIRA.
The WCE is based on the DefaultContainerExecutor. It relies on the DCE to
drive the flow of execution, but it overwrrides some emthods to the effect of:
* change the DCE created user cache directories to be owned by the job user
and by the nodemanager group.
* changes the actual container run command to use the 'createAsUser' command
of winutils task instead of 'create'
* runs the localization as standalone process instead of an in-process Java
method call. This in turn relies on the winutil createAsUser feature to run
the localization as the job user.

When compared to LinuxContainerExecutor (LCE), the WCE has some minor
differences:
* it does no delegate the creation of the user cache directories to the
native implementation.
* it does no require special handling to be able to delete user files
The approach on the WCE came from a practical trial-and-error approach. I had
to iron out some issues around the Windows script shell limitations (command
line length) to get it to work, the biggest issue being the huge CLASSPATH
that is commonplace in Hadoop environment container executions. The job
container itself is already dealing with this via a so called 'classpath
jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch
as a separate container the same issue had to be resolved and I used the same
'classpath jar' approach.
h2. Deployment Requirements
To use the WCE one needs to set the
`yarn.nodemanager.container-executor.class` to
`org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor`
and set the `yarn.nodemanager.windows-secure-container-executor.group` to a
Windows security group name that is the nodemanager service principal is a
member of (equivalent of LCE
`yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE
does not require any configuration outside of the Hadoop own's yar-site.xml.
For WCE to work the nodemanager must run as a service principal that is
member of the local Administrators group or LocalSystem. this is derived from
the need to invoke LoadUserProfile API which mention these requirements in
the specifications. This is in addition to the SE_TCB privilege mentioned in
YARN-1063, but this requirement will automatically imply that the SE_TCB
privilege is held by the nodemanager. For the Linux speakers in the audience,
the requirement is basically to run NM as root.
h2. Dedicated high privilege Service
Due to the high privilege required by the WCE we had discussed the need to
isolate the high privilege operations into a separate process, an 'executor'
service that is solely responsible to start the containers (incloding the
localizer). The NM would have to authenticate, authorize and communicate with
this service via an IPC mechanism and use this service to launch the
containers. I still believe we'll end up deploying such a service, but the
effort to onboard such a new platfrom specific new service on the project are
not trivial.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-02 Thread Jun Gong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156198#comment-14156198
]

Jun Gong commented on YARN-2617:

I investigated why TestDirectoryCollection failed. And it might be because of
YARN-2640. Could you help check and review it please? Thank you.

NM does not need to send finished container whose APP is not running to RM
--

Key: YARN-2617
URL: https://issues.apache.org/jira/browse/YARN-2617
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
Fix For: 2.6.0

Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch,
YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch,
YARN-2617.patch

We([~chenchun]) are testing RM work preserving restart and found the
following logs when we ran a simple MapReduce task PI. NM continuously
reported completed containers whose Application had already finished while AM
had finished.
{code}
2014-09-26 17:00:42,228 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Null container completed...
2014-09-26 17:00:42,228 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Null container completed...
2014-09-26 17:00:43,230 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Null container completed...
2014-09-26 17:00:43,230 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Null container completed...
2014-09-26 17:00:44,233 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Null container completed...
2014-09-26 17:00:44,233 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Null container completed...
{code}
In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean
up already completed applications. But it will only remove appId from
'app.context.getApplications()' when ApplicaitonImpl received evnet
'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might
receive this event for a long time or could not receive.
* For NonAggregatingLogHandler, it wait for
YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default,
then it will be scheduled to delete Application logs and send the event.
* For LogAggregationService, it might fail(e.g. if user does not have HDFS
write permission), and it will not send the event.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: (was: YARN-2562.5.patch)

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.5.patch

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Jun Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-2640:
---
Attachment: YARN-2640.patch

Patch submitted.

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-10-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156228#comment-14156228
 ] 

Hadoop QA commented on YARN-2562:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672512/YARN-2562.5.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5233//console

This message is automatically generated.

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
 YARN-2562.4.patch, YARN-2562.5.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156230#comment-14156230
 ] 

Hadoop QA commented on YARN-2640:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672513/YARN-2640.patch
  against trunk revision 9e40de6.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5234//console

This message is automatically generated.

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Jun Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-2640:
---
Attachment: YARN-2640.2.patch

 TestDirectoryCollection.testCreateDirectories failed
 

 Key: YARN-2640
 URL: https://issues.apache.org/jira/browse/YARN-2640
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-2640.2.patch, YARN-2640.patch


 When running test mvn test -Dtest=TestDirectoryCollection, it failed:
 {code}
 Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
 testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
   Time elapsed: 0.969 sec   FAILURE!
 java.lang.AssertionError: local dir parent not created with proper 
 permissions expected:rwxr-xr-x but was:rwxrwxr-x
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
 {code}
 I found it was because testDiskSpaceUtilizationLimit ran before 
 testCreateDirectories when running test, then directory dirA was created in 
 test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
 to create dirA with specified permission, it found dirA has already been 
 there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 108 matches

Mail list logo