[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945566#comment-16945566 ] Hadoop QA commented on YARN-8627: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} YARN-8627 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8627 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12950128/YARN-8627.003.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24914/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch, > YARN-8627.003.patch, app-domain-logs.zip > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945563#comment-16945563 ] Dinesh Chitlangia commented on YARN-8627: - [~rohithsharma] Did you get a chance to review this? Thanks. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch, > YARN-8627.003.patch, app-domain-logs.zip > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704368#comment-16704368 ] Hadoop QA commented on YARN-8627: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8627 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12950128/YARN-8627.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4b37ae05ae3d 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c9bfca2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22758/testReport/ | | Max. process+thread count | 453 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22758/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > EntityGroupFSTimelineStore
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704328#comment-16704328 ] Tarun Parimi commented on YARN-8627: [~rohithsharma] , I was unable to reproduce locally. Still cant figure out why some Tez applications write the domainlog to the wrong directory in hdfs. I think its likely a bug in Tez. Since this patch fixes the separate issue of /ats/done hdfs folder getting accumulated. Can you review the patch once again? I am reattaching it to kick in jenkins build again. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch, > YARN-8627.003.patch, app-domain-logs.zip > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646019#comment-16646019 ] Hadoop QA commented on YARN-8627: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8627 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8627 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22149/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch, > app-domain-logs.zip > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645983#comment-16645983 ] Tarun Parimi commented on YARN-8627: [~rohithsharma] Attached the domain log files for three such applications in [^app-domain-logs.zip] . All these are Tez applications. I can see that the domainId of the two domain files are a slightly different, where the one with a wrong directory structure has "Tez_ATS_application_1534014715392_830173_1", while the one in proper directory is "Tez_ATS_application_1534014715392_830173". The domainId "Tez_ATS_application_1534014715392_830173_1" seems to be created by {{org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager#createDAGDomain}} and the other domain by {{org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager#createSessionDomain}} . No idea how the DAGDomain log gets written in the wrong directory structure. Will try to see if I can reproduce locally based on this. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch, > app-domain-logs.zip > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641422#comment-16641422 ] Tarun Parimi commented on YARN-8627: Had a discussion with [~rohithsharma] offline. Summarizing our discussion here. bq. Does this happening in HDFS or any other Filesystem? I have only observed this in hdfs /ats/done directories. bq. Does the cluster is enabled with yarn.timeline-service.entity-group-fs-store.with-user-dir flag? {{yarn.timeline-service.entity-group-fs-store.with-user-dir}} is not set. The recursive delete logic of /ats/done is however bugged. Since this patch fixes it without any side affects, can we fix the root cause in another jira? I am still trying to get contents of the log files as I currently don't have access to the cluster and I am not able to reproduce these domain log files in my local. Will share it as soon as I get them and we can have separate jira for that if its fine. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640125#comment-16640125 ] Wangda Tan commented on YARN-8627: -- Thanks [~rohithsharma] for reviewing the patch. [~tarunparimi], could u check the last comment? > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626792#comment-16626792 ] Rohith Sharma K S commented on YARN-8627: - bq. One thing I noticed is that only the "domainlog" file was present in these type of repeated appid directories. Other types such as summarylog/entitylog were present only in the normal expected directory structure. Interesting!! It means retrieving these entities _*had problem i.e with ACLs but unidentified*_. Can we see content of both the domain log files? I still would like to find root cause of this issue! I suspect something going wrong while moving from active to done directory. Some doubts # Does this happening in HDFS or any other Filesystem? # Does the cluster is enabled with *yarn.timeline-service.entity-group-fs-store.with-user-dir* flag? > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626418#comment-16626418 ] Wangda Tan commented on YARN-8627: -- [~rohithsharma], could u help to review the latest comment? > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620464#comment-16620464 ] Tarun Parimi commented on YARN-8627: Thanks for the review [~rohithsharma]. I tested for folder path appid/appid/appid and this patch handles it fine. This is because only the first appid directory encountered will be deleted recursively after its child directories have been tested for modification time. I agree that we should try to find root cause for the actual creation of repeated directories. I wasn't able to reproduce this locally so wasn't able to dig much deeper. I had looked at the hadoop fs -ls -R output of /ats/done for the cluster in which I had observed the issue. One thing I noticed is that only the "domainlog" file was present in these type of repeated appid directories. Other types such as summarylog/entitylog were present only in the normal expected directory structure. Also two domainlogs are created and they have different size and modification time of one causing problem is much greater at 13:00. But not sure on the exact scenario which is causing this to happen. A sample is below. {code:java} drwxrwx--- - appuser hadoop 0 2017-10-16 13:01 /ats/done/1508116310016//000/application_1508116310016_0010 drwxrwx--- - appuser hadoop 0 2017-10-16 12:16 /ats/done/1508116310016//000/application_1508116310016_0010/appattempt_1508116310016_0010_01 -rw-r- 3 appuser hadoop 88 2017-10-16 12:20 /ats/done/1508116310016//000/application_1508116310016_0010/appattempt_1508116310016_0010_01/domainlog-appattempt_1508116310016_0010_01 -rw-r- 3 appuser hadoop 92324 2017-10-16 12:22 /ats/done/1508116310016//000/application_1508116310016_0010/appattempt_1508116310016_0010_01/summarylog-appattempt_1508116310016_0010_01 drwxrwxrwx - appuser hadoop 0 2017-10-16 13:00 /ats/done/1508116310016//000/application_1508116310016_0010/application_1508116310016_0010 drwxrwxrwx - appuser hadoop 0 2017-10-16 13:00 /ats/done/1508116310016//000/application_1508116310016_0010/application_1508116310016_0010/appattempt_1508116310016_0010_01 -rw-r- 3 appuser hadoop 90 2017-10-16 13:00 /ats/done/1508116310016//000/application_1508116310016_0010/application_1508116310016_0010/appattempt_1508116310016_0010_01/domainlog-appattempt_1508116310016_0010_01 {code} > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620341#comment-16620341 ] Rohith Sharma K S commented on YARN-8627: - Thanks [~tarunparimi] for the patch. One question is how does "/ats/done/1500089190015//017/application_1500089190015_17219/application_1500089190015_17219" created? I think we should find root cause for this. I suspect moving from active to done directory has problem. What will happen if folder path as appid/appid/appid? > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578293#comment-16578293 ] Tarun Parimi commented on YARN-8627: [~rohithsharma] , please review the patch when free. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571082#comment-16571082 ] genericqa commented on YARN-8627: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8627 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934581/YARN-8627.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c2d1528981e6 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2e4e02b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21527/testReport/ | | Max. process+thread count | 316 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21527/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > EntityGroupFSTimelineStore
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571043#comment-16571043 ] Tarun Parimi commented on YARN-8627: Adding a test case in TestEntityGroupFSTimelineStore#testCleanLogs to check the cleaning of an app directory with multiple attempt dirs and an app dir within an app dir. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8627.001.patch, YARN-8627.002.patch > > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480) > > {code} > > Each time the thread gets scheduled, it is a different folder encountering > the error. As a result, the thread is not able to clean all the old done > directories, since it stops after this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570274#comment-16570274 ] genericqa commented on YARN-8627: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8627 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934478/YARN-8627.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1c103bfa9c2d 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bcfc985 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21516/testReport/ | | Max. process+thread count | 301 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21516/console | | Powered by | Apache
[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
[ https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569873#comment-16569873 ] Tarun Parimi commented on YARN-8627: On further analysis, I found that this error occurs for application directories which themselves have another application directory such as /ats/done/1500089190015//017/application_1500089190015_17219/application_1500089190015_17219. In audit logs I see that ats tries to list the folder after deleting it, which causes the error. {code:java} 19:35:45,944 INFO FSNamesystem.audit: allowed=true ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus src=/ats/done/1500089190015//017/application_1500089190015_17219 dst=nullperm=null proto=rpc callerContext=yarn_ats_server_v1_5 19:35:45,945 INFO FSNamesystem.audit: allowed=true ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus src=/ats/done/1500089190015//017/application_1500089190015_17219 dst=nullperm=null proto=rpc callerContext=yarn_ats_server_v1_5 19:35:45,946 INFO FSNamesystem.audit: allowed=true ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus src=/ats/done/1500089190015//017/application_1500089190015_17219/appattempt_1500089190015_17219_01 dst=nullperm=null proto=rpc callerContext=yarn_ats_server_v1_5 19:35:45,947 INFO FSNamesystem.audit: allowed=true ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus src=/ats/done/1500089190015//017/application_1500089190015_17219/application_1500089190015_17219 dst=nullperm=null proto=rpc callerContext=yarn_ats_server_v1_5 19:35:45,948 INFO FSNamesystem.audit: allowed=true ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus src=/ats/done/1500089190015//017/application_1500089190015_17219/application_1500089190015_17219/appattempt_1500089190015_17219_01 dst=nullperm=null proto=rpc callerContext=yarn_ats_server_v1_5 19:35:45,952 INFO FSNamesystem.audit: allowed=true ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=delete src=/ats/done/1500089190015//017/application_1500089190015_17219 dst=nullperm=null proto=rpc callerContext=yarn_ats_server_v1_5 19:35:45,953 INFO FSNamesystem.audit: allowed=true ugi=yarn/rm-u...@example.com (auth:KERBEROS)ip=/x.x.x.x cmd=listStatus src=/ats/done/1500089190015//017/application_1500089190015_17219 dst=nullperm=null proto=rpc callerContext=yarn_ats_server_v1_5 {code} I am not sure how this directory structure got created in the first place. But the cleaner thread should not list a directory after deleting the same. The {{EntityGroupFSTimelineStore#cleanLogs}} method tries to delete the parent directory {{dirpath}}, while it is iterating over the same dirpath. It should only try to delete its children so as to avoid these issues. Testing a patch which does this in my environment and it seems to fix the issue. Will upload a patch soon after doing further tests. > EntityGroupFSTimelineStore hdfs done directory keeps on accumulating > > > Key: YARN-8627 > URL: https://issues.apache.org/jira/browse/YARN-8627 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > > The EntityLogCleaner threads exits with the following ERROR every time it > runs. > {code:java} > 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268 > 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore > (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files > java.io.FileNotFoundException: File > hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270 > does not exist. at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040) > at >