[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2019-10-06 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945566#comment-16945566
 ] 

Hadoop QA commented on YARN-8627:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-8627 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8627 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950128/YARN-8627.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24914/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch, 
> YARN-8627.003.patch, app-domain-logs.zip
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2019-10-06 Thread Dinesh Chitlangia (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945563#comment-16945563
 ] 

Dinesh Chitlangia commented on YARN-8627:
-

[~rohithsharma] Did you get a chance to review this? Thanks.

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch, 
> YARN-8627.003.patch, app-domain-logs.zip
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-11-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704368#comment-16704368
 ] 

Hadoop QA commented on YARN-8627:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
20s{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8627 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950128/YARN-8627.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4b37ae05ae3d 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c9bfca2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22758/testReport/ |
| Max. process+thread count | 453 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/22758/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> EntityGroupFSTimelineStore 

[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-11-29 Thread Tarun Parimi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704328#comment-16704328
 ] 

Tarun Parimi commented on YARN-8627:


[~rohithsharma] , I was unable to reproduce locally. Still cant figure out why 
some Tez applications write the domainlog to the wrong directory in hdfs. I 
think its likely a bug in Tez. Since this patch fixes the separate issue of 
/ats/done hdfs folder getting accumulated. Can you review the patch once again? 
 I am reattaching it to kick in jenkins build again.

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch, 
> YARN-8627.003.patch, app-domain-logs.zip
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-10-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646019#comment-16646019
 ] 

Hadoop QA commented on YARN-8627:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8627 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8627 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/22149/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch, 
> app-domain-logs.zip
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-10-10 Thread Tarun Parimi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645983#comment-16645983
 ] 

Tarun Parimi commented on YARN-8627:


[~rohithsharma] Attached the domain log files for three such applications in 
[^app-domain-logs.zip] . All these are Tez applications. I can see that the 
domainId of the two domain files are a slightly different, where the one with a 
wrong directory structure has  "Tez_ATS_application_1534014715392_830173_1", 
while the one in proper directory is 
"Tez_ATS_application_1534014715392_830173". 

 The domainId "Tez_ATS_application_1534014715392_830173_1" seems to be created 
by 
{{org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager#createDAGDomain}}
 and the other domain by 
{{org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager#createSessionDomain}}
 .

No idea how the DAGDomain log gets written in the wrong directory structure. 
Will try to see if I can reproduce locally based on this.

 

 

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch, 
> app-domain-logs.zip
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-10-08 Thread Tarun Parimi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641422#comment-16641422
 ] 

Tarun Parimi commented on YARN-8627:


Had a discussion with [~rohithsharma] offline. Summarizing our discussion here.

bq. Does this happening in HDFS or any other Filesystem?
I have only observed this in hdfs /ats/done directories.

bq. Does the cluster is enabled with 
yarn.timeline-service.entity-group-fs-store.with-user-dir flag?
{{yarn.timeline-service.entity-group-fs-store.with-user-dir}} is not set.

The recursive delete logic of /ats/done is however bugged. Since this patch 
fixes it without any side affects, can we fix the root cause in another jira? 

I am still trying to get contents of the log files as I currently don't have 
access to the cluster and I am not able to reproduce these domain log files in 
my local. Will share it as soon as I get them and we can have separate jira for 
that if its fine.

 

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-10-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640125#comment-16640125
 ] 

Wangda Tan commented on YARN-8627:
--

Thanks [~rohithsharma] for reviewing the patch.

[~tarunparimi], could u check the last comment? 

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-09-24 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626792#comment-16626792
 ] 

Rohith Sharma K S commented on YARN-8627:
-

bq. One thing I noticed is that only the "domainlog" file was present in these 
type of repeated appid directories. Other types such as summarylog/entitylog 
were present only in the normal expected directory structure.
Interesting!! It means retrieving these entities _*had problem i.e with ACLs 
but unidentified*_. Can we see content of both the domain log files? I still 
would like to find root cause of this issue! I suspect something going wrong 
while moving from active to done directory.
Some doubts
# Does this happening in HDFS or any other Filesystem? 
# Does the cluster is enabled with 
*yarn.timeline-service.entity-group-fs-store.with-user-dir* flag? 


> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-09-24 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626418#comment-16626418
 ] 

Wangda Tan commented on YARN-8627:
--

[~rohithsharma], could u help to review the latest comment?

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-09-19 Thread Tarun Parimi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620464#comment-16620464
 ] 

Tarun Parimi commented on YARN-8627:


Thanks for the review [~rohithsharma]. 

I tested for folder path appid/appid/appid and this patch handles it fine. This 
is because only the first appid directory encountered will be deleted 
recursively after its child directories have been tested for modification time. 

I agree that we should try to find root cause for the actual creation of 
repeated directories. I wasn't able to reproduce this locally so wasn't able to 
dig much deeper.

I had looked at the hadoop fs -ls -R output of /ats/done for the cluster in 
which I had observed the issue. One thing I noticed is that only the 
"domainlog" file was present in these type of repeated appid directories. Other 
types such as summarylog/entitylog were present only in the normal expected 
directory structure. Also two domainlogs are created and they have different 
size and modification time of one causing problem is much greater at 13:00. But 
not sure on the exact scenario which is causing this to happen. A sample is 
below.

 
{code:java}
drwxrwx--- - appuser hadoop 0 2017-10-16 13:01 
/ats/done/1508116310016//000/application_1508116310016_0010
drwxrwx--- - appuser hadoop 0 2017-10-16 12:16 
/ats/done/1508116310016//000/application_1508116310016_0010/appattempt_1508116310016_0010_01
-rw-r- 3 appuser hadoop 88 2017-10-16 12:20 
/ats/done/1508116310016//000/application_1508116310016_0010/appattempt_1508116310016_0010_01/domainlog-appattempt_1508116310016_0010_01
-rw-r- 3 appuser hadoop 92324 2017-10-16 12:22 
/ats/done/1508116310016//000/application_1508116310016_0010/appattempt_1508116310016_0010_01/summarylog-appattempt_1508116310016_0010_01
drwxrwxrwx - appuser hadoop 0 2017-10-16 13:00 
/ats/done/1508116310016//000/application_1508116310016_0010/application_1508116310016_0010
drwxrwxrwx - appuser hadoop 0 2017-10-16 13:00 
/ats/done/1508116310016//000/application_1508116310016_0010/application_1508116310016_0010/appattempt_1508116310016_0010_01
-rw-r- 3 appuser hadoop 90 2017-10-16 13:00 
/ats/done/1508116310016//000/application_1508116310016_0010/application_1508116310016_0010/appattempt_1508116310016_0010_01/domainlog-appattempt_1508116310016_0010_01
 
{code}
 

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-09-19 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620341#comment-16620341
 ] 

Rohith Sharma K S commented on YARN-8627:
-

Thanks [~tarunparimi] for the patch. One question is how does 
"/ats/done/1500089190015//017/application_1500089190015_17219/application_1500089190015_17219"
 created? I think we should find root cause for this. I suspect moving from 
active to done directory has problem. What will happen if folder path as 
appid/appid/appid?

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-08-13 Thread Tarun Parimi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578293#comment-16578293
 ] 

Tarun Parimi commented on YARN-8627:


[~rohithsharma] , please review the patch when free.

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-08-06 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571082#comment-16571082
 ] 

genericqa commented on YARN-8627:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
20s{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8627 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934581/YARN-8627.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c2d1528981e6 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2e4e02b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21527/testReport/ |
| Max. process+thread count | 316 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21527/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> EntityGroupFSTimelineStore 

[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-08-06 Thread Tarun Parimi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571043#comment-16571043
 ] 

Tarun Parimi commented on YARN-8627:


Adding a test case in TestEntityGroupFSTimelineStore#testCleanLogs to check the 
cleaning of an app directory with multiple attempt dirs and an app dir within 
an app dir.

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-08-06 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570274#comment-16570274
 ] 

genericqa commented on YARN-8627:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
17s{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8627 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934478/YARN-8627.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1c103bfa9c2d 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bcfc985 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21516/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21516/console |
| Powered by | Apache 

[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-08-06 Thread Tarun Parimi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569873#comment-16569873
 ] 

Tarun Parimi commented on YARN-8627:


On further analysis, I found that this error occurs for application directories 
which themselves have another application directory such as 
/ats/done/1500089190015//017/application_1500089190015_17219/application_1500089190015_17219.
 In audit logs I see that ats tries to list the folder after deleting it, which 
causes the error.
{code:java}
19:35:45,944 INFO FSNamesystem.audit: allowed=true  
ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus  
src=/ats/done/1500089190015//017/application_1500089190015_17219
dst=nullperm=null   proto=rpc   
callerContext=yarn_ats_server_v1_5
19:35:45,945 INFO FSNamesystem.audit: allowed=true  
ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus  
src=/ats/done/1500089190015//017/application_1500089190015_17219
dst=nullperm=null   proto=rpc   
callerContext=yarn_ats_server_v1_5
19:35:45,946 INFO FSNamesystem.audit: allowed=true  
ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus  
src=/ats/done/1500089190015//017/application_1500089190015_17219/appattempt_1500089190015_17219_01
  dst=nullperm=null   proto=rpc   
callerContext=yarn_ats_server_v1_5
19:35:45,947 INFO FSNamesystem.audit: allowed=true  
ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus  
src=/ats/done/1500089190015//017/application_1500089190015_17219/application_1500089190015_17219
dst=nullperm=null   proto=rpc   
callerContext=yarn_ats_server_v1_5
19:35:45,948 INFO FSNamesystem.audit: allowed=true  
ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus  
src=/ats/done/1500089190015//017/application_1500089190015_17219/application_1500089190015_17219/appattempt_1500089190015_17219_01
  dst=nullperm=null   proto=rpc   
callerContext=yarn_ats_server_v1_5
19:35:45,952 INFO FSNamesystem.audit: allowed=true  
ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=delete  
src=/ats/done/1500089190015//017/application_1500089190015_17219
dst=nullperm=null   proto=rpc   
callerContext=yarn_ats_server_v1_5
19:35:45,953 INFO FSNamesystem.audit: allowed=true  
ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus  
src=/ats/done/1500089190015//017/application_1500089190015_17219
dst=nullperm=null   proto=rpc   
callerContext=yarn_ats_server_v1_5
{code}
I am not sure how this directory structure got created in the first place. But 
the cleaner thread should not list a directory after deleting the same. 

The {{EntityGroupFSTimelineStore#cleanLogs}} method tries to delete the parent 
directory {{dirpath}}, while it is iterating over the same dirpath. It should 
only try to delete its children so as to avoid these issues. Testing a patch 
which does this in my environment and it seems to fix the issue. Will upload a 
patch soon after doing further tests.

 

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
>