[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535062#comment-14535062
 ] 

Hadoop QA commented on YARN-3476:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 38s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  2s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |   5m 47s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  41m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731375/0002-YARN-3476.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f4ebbc6 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7810/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7810/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7810/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7810/console |


This message was automatically generated.

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, 
 0002-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521471#comment-14521471
 ] 

Hadoop QA commented on YARN-3476:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   7m 40s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  1s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   5m 58s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  48m 51s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729459/0001-YARN-3476.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / de9404f |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/console |


This message was automatically generated.

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509841#comment-14509841
 ] 

Jason Lowe commented on YARN-3476:
--

I'm OK with deleting the logs upon error uploading.  It should be a rare 
occurrence, and log availability is already a best-effort rather than 
guaranteed service.  Even if we try to retain the logs it has questionable 
benefit in practice, as the history of a job always points to the aggregated 
logs, not the node's copy of the logs, and thus the logs will still be lost 
from the end-user's point of view.  Savvy users may realize the logs could 
still be on the original node, but most won't know to check there or how to 
form the URL to find them.  If we always point to the node then that defeats 
one of the features of log aggregation, since loss of the node will mean the 
node's URL is bad and we fail to show the logs even if they are aggregated.

So for now I say we keep it simple and just cleanup the files on errors to 
prevent leaks.  Speaking of which I took a look at the patch.  It will fix the 
particular error we saw with TFiles, but there could easily be other 
non-IOExceptions that creep out of the code, especially as it is maintained 
over time.  Would it be better to wrap the cleanup in a finally block or 
something a little more broadly applicable to errors that occur?

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498387#comment-14498387
 ] 

Sunil G commented on YARN-3476:
---

HI [~jlowe] and [~rohithsharma]

A retention logic to handle this error may become more complex when multiple 
failures seen during aggretion across application. If this happens rarely, a 
strong retention logic with  a timer s helpful.

On a generic level, by considering more failures, a clean up after aggression 
can save the disk. Which s acceptable as we encountered error and there may not 
be real pressure to give 100% good logs with an error while aggretion.

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-16 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498951#comment-14498951
 ] 

Rohith commented on YARN-3476:
--

Thanks [~sunilg] for sharing your thoughts.
Going for retention logic or time, thinking about NM recovery that retention 
logic should be stored in state store.  Then NM should support for state store 
update in AggregatddLogService similar to NonAggregatedLogHandler

[~jlowe] I attached patch with straightforward fix that handling exception and 
do post aggregation clean up. Kindly share your opinion on 2 approaches i.e 1. 
handling exception and 2. retention logic

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496099#comment-14496099
 ] 

Hadoop QA commented on YARN-3476:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724974/0001-YARN-3476.patch
  against trunk revision fddd552.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7346//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7346//console

This message is automatically generated.

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-13 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492593#comment-14492593
 ] 

Rohith commented on YARN-3476:
--

bq. Not all of the application's logs were available in HDFS because it 
encountered an error (token-related) trying to upload the logs.
Is this because of IllegalStateException caused failure?

There are 2 options
# do post-aggregation cleanup by handling IllegalStateException OR
# scheduler timer for those log directories which are not uploaded and which 
are not deleted.
Thinking,does IllegalStateException is causing log not to be found in hdfs? If 
it is not then I think simple way to handle is 1st approach. 

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith

 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490233#comment-14490233
 ] 

Jason Lowe commented on YARN-3476:
--

bq. Such cases we can use the aggregated log-retention configuration i.e 
'yarn.log-aggregation.retain-seconds' for deleting from disk.

No, that's not going to be OK.  One is the lifetime on the aggregation 
filesystem and what we're considering is the lifetime on the local disk.  I can 
hold a lot more logs in HDFS than I can in the local disk.

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith

 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)