[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209500#comment-15209500 ] Jason Lowe commented on HADOOP-12107: - This appears to be triggering OOM errors in Tez during task transitions in container reuse for some scenarios, see HADOOP-12958 for details. We had at least one job that would fail with an OOM every time during container reuse, and it passes once this change was reverted. > long running apps may have a huge number of StatisticsData instances under > FileSystem > - > > Key: HADOOP-12107 > URL: https://issues.apache.org/jira/browse/HADOOP-12107 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, > HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch > > > We observed with some of our apps (non-mapreduce apps that use filesystems) > that they end up accumulating a huge memory footprint coming from > {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of > {{Statistics}}). > Although the thread reference from {{StatisticsData}} is a weak reference, > and thus can get cleared once a thread goes away, the actual > {{StatisticsData}} instances in the list won't get cleared until any of these > following methods is called on {{Statistics}}: > - {{getBytesRead()}} > - {{getBytesWritten()}} > - {{getReadOps()}} > - {{getLargeReadOps()}} > - {{getWriteOps()}} > - {{toString()}} > It is quite possible to have an application that interacts with a filesystem > but does not call any of these methods on the {{Statistics}}. If such an > application runs for a long time and has a large amount of thread churn, the > memory footprint will grow significantly. > The current workaround is either to limit the thread churn or to invoke these > operations occasionally to pare down the memory. However, this is still a > deficiency with {{FileSystem$Statistics}} itself in that the memory is > controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099221#comment-15099221 ] Hudson commented on HADOOP-12107: - FAILURE: Integrated in Hadoop-trunk-Commit #9116 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9116/]) Update CHANGES.txt for commit of HADOOP-12107 to branch-2.7 and (jlowe: rev 651c23e8ef8aeafd999249ce57b31e689bd2ece6) * hadoop-common-project/hadoop-common/CHANGES.txt > long running apps may have a huge number of StatisticsData instances under > FileSystem > - > > Key: HADOOP-12107 > URL: https://issues.apache.org/jira/browse/HADOOP-12107 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, > HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch > > > We observed with some of our apps (non-mapreduce apps that use filesystems) > that they end up accumulating a huge memory footprint coming from > {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of > {{Statistics}}). > Although the thread reference from {{StatisticsData}} is a weak reference, > and thus can get cleared once a thread goes away, the actual > {{StatisticsData}} instances in the list won't get cleared until any of these > following methods is called on {{Statistics}}: > - {{getBytesRead()}} > - {{getBytesWritten()}} > - {{getReadOps()}} > - {{getLargeReadOps()}} > - {{getWriteOps()}} > - {{toString()}} > It is quite possible to have an application that interacts with a filesystem > but does not call any of these methods on the {{Statistics}}. If such an > application runs for a long time and has a large amount of thread churn, the > memory footprint will grow significantly. > The current workaround is either to limit the thread churn or to invoke these > operations occasionally to pare down the memory. However, this is still a > deficiency with {{FileSystem$Statistics}} itself in that the memory is > controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096632#comment-15096632 ] Jason Lowe commented on HADOOP-12107: - I tried pulling this into branch-2.7. It comes in cleanly except for CHANGES.txt, but the unit test failed. Subsequent runs showed it was flaky, sometimes passing and sometimes not. Looking around I noticed a number of precommit builds have been complaining about this test. I can reproduce the flaky test on trunk, so I filed HADOOP-12706. I'd really like this fix backported, but I'd also rather not add another flaky test to 2.7 and 2.6. [~sjlee0] or [~mingma] could you take a look into the unit test? > long running apps may have a huge number of StatisticsData instances under > FileSystem > - > > Key: HADOOP-12107 > URL: https://issues.apache.org/jira/browse/HADOOP-12107 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Fix For: 2.8.0 > > Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, > HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch > > > We observed with some of our apps (non-mapreduce apps that use filesystems) > that they end up accumulating a huge memory footprint coming from > {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of > {{Statistics}}). > Although the thread reference from {{StatisticsData}} is a weak reference, > and thus can get cleared once a thread goes away, the actual > {{StatisticsData}} instances in the list won't get cleared until any of these > following methods is called on {{Statistics}}: > - {{getBytesRead()}} > - {{getBytesWritten()}} > - {{getReadOps()}} > - {{getLargeReadOps()}} > - {{getWriteOps()}} > - {{toString()}} > It is quite possible to have an application that interacts with a filesystem > but does not call any of these methods on the {{Statistics}}. If such an > application runs for a long time and has a large amount of thread churn, the > memory footprint will grow significantly. > The current workaround is either to limit the thread churn or to invoke these > operations occasionally to pare down the memory. However, this is still a > deficiency with {{FileSystem$Statistics}} itself in that the memory is > controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089972#comment-15089972 ] Jason Lowe commented on HADOOP-12107: - I recently ran across this on a NodeManager running 2.6 that had been up for a while. Any objections to this being picked back to 2.6 and 2.7? > long running apps may have a huge number of StatisticsData instances under > FileSystem > - > > Key: HADOOP-12107 > URL: https://issues.apache.org/jira/browse/HADOOP-12107 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Fix For: 2.8.0 > > Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, > HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch > > > We observed with some of our apps (non-mapreduce apps that use filesystems) > that they end up accumulating a huge memory footprint coming from > {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of > {{Statistics}}). > Although the thread reference from {{StatisticsData}} is a weak reference, > and thus can get cleared once a thread goes away, the actual > {{StatisticsData}} instances in the list won't get cleared until any of these > following methods is called on {{Statistics}}: > - {{getBytesRead()}} > - {{getBytesWritten()}} > - {{getReadOps()}} > - {{getLargeReadOps()}} > - {{getWriteOps()}} > - {{toString()}} > It is quite possible to have an application that interacts with a filesystem > but does not call any of these methods on the {{Statistics}}. If such an > application runs for a long time and has a large amount of thread churn, the > memory footprint will grow significantly. > The current workaround is either to limit the thread churn or to invoke these > operations occasionally to pare down the memory. However, this is still a > deficiency with {{FileSystem$Statistics}} itself in that the memory is > controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090150#comment-15090150 ] Sangjin Lee commented on HADOOP-12107: -- +1 > long running apps may have a huge number of StatisticsData instances under > FileSystem > - > > Key: HADOOP-12107 > URL: https://issues.apache.org/jira/browse/HADOOP-12107 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Fix For: 2.8.0 > > Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, > HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch > > > We observed with some of our apps (non-mapreduce apps that use filesystems) > that they end up accumulating a huge memory footprint coming from > {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of > {{Statistics}}). > Although the thread reference from {{StatisticsData}} is a weak reference, > and thus can get cleared once a thread goes away, the actual > {{StatisticsData}} instances in the list won't get cleared until any of these > following methods is called on {{Statistics}}: > - {{getBytesRead()}} > - {{getBytesWritten()}} > - {{getReadOps()}} > - {{getLargeReadOps()}} > - {{getWriteOps()}} > - {{toString()}} > It is quite possible to have an application that interacts with a filesystem > but does not call any of these methods on the {{Statistics}}. If such an > application runs for a long time and has a large amount of thread churn, the > memory footprint will grow significantly. > The current workaround is either to limit the thread churn or to invoke these > operations occasionally to pare down the memory. However, this is still a > deficiency with {{FileSystem$Statistics}} itself in that the memory is > controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610306#comment-14610306 ] Hudson commented on HADOOP-12107: - FAILURE: Integrated in Hadoop-Hdfs-trunk #2172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2172/]) HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610348#comment-14610348 ] Hudson commented on HADOOP-12107: - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #233 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/233/]) HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608516#comment-14608516 ] Hudson commented on HADOOP-12107: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2190 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2190/]) HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * hadoop-common-project/hadoop-common/CHANGES.txt long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608460#comment-14608460 ] Hudson commented on HADOOP-12107: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #242 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/242/]) HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609256#comment-14609256 ] Colin Patrick McCabe commented on HADOOP-12107: --- Thanks, guys. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606431#comment-14606431 ] Gera Shegalov commented on HADOOP-12107: bq. You could make the same argument to stop development on almost any patch. I disagree with such a strong statement. It's not the case in my experience. Thanks for pointing out the compatibility document. It gives us a formal basis to go on, and not delay [~sjlee0]'s important fix. Maybe one day we'll have a compatibility test suite based on that doc. bq. It's simply unreasonable to try to support users who are putting their code inside the org.apache.hadoop.fs We develop new Hadoop features and often they do not make it upstream immediately. It happens that we have classes in their intended packages but we can deal with this. We are not affected by this particular change, either. +1 for both trunk and branch 2. [~mingma], do you want to exercise your committer rights :) ? long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606271#comment-14606271 ] Colin Patrick McCabe commented on HADOOP-12107: --- Guys, we clearly define the API contract for the project. See http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html You have to remember that: 1. The function that you are talking about changing (the constructor) is not public from Java's point of view. It is package-private. 2. The function that you are talking about changing is not public from Hadoop's point of view (there is no \@Public or \@LimitedPrivate annotation on it) There is simply no reason to treat this as public. bq. However, at the expense of being too defensive, the only test I apply here: is there hypothetically a scenario where an API user can be broken? My answer is yes if you have some org.apache.hadoop.fs.Foo calling the constructor even though the user absolutely should not do it. You could make the same argument to stop development on almost any patch. Almost every patch changes things which are private or package-private inside Hadoop. It's simply unreasonable to try to support users who are putting their code inside the org.apache.hadoop.fs namespace (or any other internal project namespace) long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606495#comment-14606495 ] Hudson commented on HADOOP-12107: - FAILURE: Integrated in Hadoop-trunk-Commit #8089 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8089/]) HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java * hadoop-common-project/hadoop-common/CHANGES.txt long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606514#comment-14606514 ] Ming Ma commented on HADOOP-12107: -- Also thanks for [~walter.k.su] and [~sandyr] for the review and suggestion. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606509#comment-14606509 ] Ming Ma commented on HADOOP-12107: -- I have committed this to trunk and branch-2. Thanks [~sjlee0] for the contribution and [~jira.shegalov] and [~cmccabe] for the code review! long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606552#comment-14606552 ] Sangjin Lee commented on HADOOP-12107: -- Thanks [~mingma] for the commit! Many thanks to [~jira.shegalov] and [~cmccabe] for the invaluable review. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 2.8.0 Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603218#comment-14603218 ] Sandy Ryza commented on HADOOP-12107: - I don't know of anyone using the StatisticsData constructor. I originally made the class public for Spark's needs and it shouldn't ever need that constructor. This is neither an endorsement nor a -1 against removing the constructor. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603211#comment-14603211 ] Sangjin Lee commented on HADOOP-12107: -- [~cmccabe], your thoughts on this? {{StatisticsData}} was made public in 2.5 (HADOOP-10688), and I doubt there would be an actual case of using the constructor. IMO, removing the constructor (and the member) seems pretty safe, in theory and practice. [~sandyr], what do you think? long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603216#comment-14603216 ] Sangjin Lee commented on HADOOP-12107: -- FWIW, when I look at the spark code (which may have been the main reason for changes in HADOOP-10688), it doesn't look like it's using the constructor or the owner member variable: https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603433#comment-14603433 ] Colin Patrick McCabe commented on HADOOP-12107: --- OK, at the risk of being pedantic, here is my rundown. While the {{StatisticsData}} class itself is public, the {{StatisticsData}} constructor is not. It is package-private (the access class which things get in Java if there is no public, private, or protected keyword on them.) This means that a {{StatisticsData}} object can only be created by code in the {{org.apache.hadoop.fs}} package. You can try this for yourself-- write a program external to hadoop that tries to create a {{StatisticsData}} object via this constructor. It will not compile. This constructor is safe to remove, so let's do that. bq. Colin Patrick McCabe, good point on the one hand but on the other hand this constructor is package-scope, and technically usable if an creates a class with the same package name, regardless how unlikely or illegal (in terms of specified audience) it is. How about we defensively keep that constructor for branch-2 at least? No. Users simply can't add code to the {{org.apache.hadoop.fs}} package. If they do, things are not going to work-- there are going to be naming conflicts, class resolution issues, etc. etc. There is no possible way we can support users doing this and no reason to support it. If we tried, we would have to essentially freeze the API of every single class in Hadoop-- we would have to re-have this discussion each time we changed some package-private variable or function. Private and package-private stuff is private-- it's even enforced by the compiler, you can't get much more private than that. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603982#comment-14603982 ] Gera Shegalov commented on HADOOP-12107: I agree with the sentiment what API isolation contract should be. However, at the expense of being too defensive, the only test I apply here: is there hypothetically a scenario where an API user can be broken? My answer is yes if you have some {{org.apache.hadoop.fs.Foo}} calling the constructor even though the user absolutely should not do it. Regarding freezing the API, given that the question was only about branch-2, it does not sound negative to me. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602230#comment-14602230 ] Gera Shegalov commented on HADOOP-12107: bq StatisticsData is a public class, but its constructor is not public. [~cmccabe], good point on the one hand but on the other hand this constructor is package-scope, and technically usable if an creates a class with the same package name, regardless how unlikely or illegal (in terms of specified audience) it is. How about we defensively keep that constructor for branch-2 at least? long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600376#comment-14600376 ] Sangjin Lee commented on HADOOP-12107: -- I believe the test failure is unrelated. Also, the said test passes fine locally. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600361#comment-14600361 ] Hadoop QA commented on HADOOP-12107: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 6s | The applied patch generated 1 new checkstyle issues (total was 142, now 140). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 50s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 21m 33s | Tests failed in hadoop-common. | | | | 60m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.security.ssl.TestReloadingX509TrustManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741705/HADOOP-12107.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / afe9ea3 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/console | This message was automatically generated. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch, HADOOP-12107.005.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599861#comment-14599861 ] Colin Patrick McCabe commented on HADOOP-12107: --- bq. 003 patch is good. I just don't understand why 001 patch creates one additional thread per FileSystem object? I look at FileSystem#getStaticstics(..). I think it creates one Staticstics object per FileSystem class, so 001 patch creates one additional thread per FileSystem class? I'll be grateful if somebody can guide me through this. D'oh. You're absolutely right... there is only one thread per FileSystem class. Forget what I said earlier. bq. One other point of discussion: I'm removing the StatisticsData constructor that takes a Thread as the argument (along with the owner member variable) as it no longer needs the thread inside StatisticsData. But since StatisticsData is technically a public class, it could be considered an API incompatible change. Thoughts on this? No one outside FileSystem is using the constructor or the member variable, and I cannot think of why anyone would. But if we need to absolutely maintain the API compatibility, I cannot remove them. Let me know what you think. StatisticsData is a public class, but its constructor is not public. So there is no possible API breakage here. (As a side note, even if StatisticsData were public, it doesn't have an interface annotation specifying that it is safe to use outside of Hadoop, so we're doubly safe here.) {code} 2921 /** 2922 * This constructor is deprecated and is no longer used. One should remove 2923 * any use of this constructor. 2924 */ 2925 @Deprecated 2926 StatisticsData(WeakReferenceThread owner) { {code} Like I explained above, we don't need this. Let's get rid of it. StatisticsDataReference: let's put an {{\@Override}} annotation on the functions which implement {{PhantomReference}} APIs. {code} 142 final int maxSeconds = 10; 143 for (int i = 0; i maxSeconds; i++) { 144 Thread.sleep(1000L); 145 allDataSize = stats.getAllThreadLocalDataSize(); 146 if (allDataSize == 0) { 147 LOG.info(cleaned up after + (i+1) + seconds); 148 break; 149 } else { 150 LOG.info(not cleaned up after + (i+1) + seconds; waiting...); 151 } 152 } {code} Let's use {{GenericTestUtils#waitFor}} here. +1 once those are addressed. Thanks, [~sjlee0]. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597851#comment-14597851 ] Sangjin Lee commented on HADOOP-12107: -- Thanks for pointing that out [~walter.k.su]. I think you're right in that normally one {{Statistics}} instance is created per {{FileSystem}} class. That said, it is possible to create {{Statistics}} instances in other manners simply via invoking the public constructor. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598388#comment-14598388 ] Ming Ma commented on HADOOP-12107: -- Thanks [~sjlee0]. Latest patch LGTM. [~cmccabe], [~jira.shegalov] any additional comments? long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598397#comment-14598397 ] Gera Shegalov commented on HADOOP-12107: Thanks for v3 [~sjlee0]! *FileSystem.java:* *{{getThreadStatistics}}:* Minimize the code executed under the monitor. Pull reference creation out of {{synchronized}} similar to what it was before. Note that currentThread is a native call. *{{Cleaner#run}}* Catch and log InterruptedException in the while loop, such that thread does not die on a spurious wakeup. It's safe since it's a daemon thread. Nits: can we be more specific in the naming, to the tune of: STATS_DATA_CLEANER, STATS_DATA_REFQUEUE, StatsDataCleaner. *{{testStatisticsThreadLocalDataCleanUp}}* Since the test uses waits, pass some reasonable timeout to {{@Test}} make 'int size' and 'int maxSeconds' final. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597297#comment-14597297 ] Walter Su commented on HADOOP-12107: bq. This will create one additional thread per FileSystem object. 003 patch is good. I just don't understand why 001 patch creates one additional thread per FileSystem *object*? I look at {{FileSystem#getStaticstics(..)}}. I think it creates one Staticstics object per FileSystem *class*, so 001 patch creates one additional thread per FileSystem *class*? I'll be grateful if somebody can guide me through this. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598617#comment-14598617 ] Gera Shegalov commented on HADOOP-12107: +1, 004 LGTM long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598570#comment-14598570 ] Hadoop QA commented on HADOOP-12107: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 18s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 4s | The applied patch generated 1 new checkstyle issues (total was 142, now 141). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 51s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 21m 41s | Tests passed in hadoop-common. | | | | 60m 38s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741393/HADOOP-12107.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 122cad6 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/console | This message was automatically generated. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch, HADOOP-12107.004.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596117#comment-14596117 ] Ming Ma commented on HADOOP-12107: -- Thanks, [~sjlee0], great find. It overall looks good to me. * Is there a good way to write any unit test for this? * This will create one additional thread per FileSystem object. It is fine for most scenarios where one FileSystem object is shared among different threads in the same JVM. If the application creates many FileSystem objects in same JVM, this will create lots of additional threads. But it doesn't seem like something worthwhile optimizing for. Just bring it up in case. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596543#comment-14596543 ] Colin Patrick McCabe commented on HADOOP-12107: --- Thanks for looking at this, [~sjlee0]. It looks like a good idea to fix this. As [~mingma] commented, I think we need to have a single thread per JVM to do this, rather than one thread per FileSystem object. We have a number of Hadoop applications which create lots of FileSystem objects, and creating a thread per FS object would just be too many threads. We could end up with a thread leak which was worse than the memory leak described here. (I agree that well-written applications should strive to avoid creating too many FileSystem objects, but that's a separate issue...) {code} private synchronized T T visitAll(StatisticsAggregatorT visitor) { visitor.accept(rootData); if (allData != null) { {code} The null check isn't needed any more since you changed allData to be initialized in the constructor. thanks long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596951#comment-14596951 ] Gera Shegalov commented on HADOOP-12107: Between a single thread and a thread per FileSystem object, we can compromise on having a configurable single thread pool for the class FileSystem. Let us maintain fool-proof compatibility to make sure that we can safely backport the patch, and just deprecate the old constructor. I don't see a test covering the scenario in this JIRA. It would be good to add a test to {{fs.FCStatisticsBaseTest}} long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596806#comment-14596806 ] Sangjin Lee commented on HADOOP-12107: -- Thanks [~mingma] and [~cmccabe] for your comments! bq. Is there a good way to write any unit test for this? I was relying on existing unit tests on {{FileSystem.Statistics}} (there are several). I'm still hoping that is good enough for this, but let me know if you want me to look into creating more unit tests around this. Regarding the number of threads for the cleaner, yes, it definitely occurred to me that this might create an issue in terms of potentially a large number of additional threads. The only reason I shied away from a global cleaner thread initially is I wasn't entirely sure whether it would be completely safe and perform well in terms of locking (note that the cleaner needs to acquire a lock for each {{Statistics}} instance). Let me explore that still. bq. The null check isn't needed any more since you changed allData to be initialized in the constructor. Thanks for pointing that out. I remembered it at one point, but forgot to include that change in the patch. One other point of discussion: I'm removing the {{StatisticsData}} constructor that takes a Thread as the argument (along with the {{owner}} member variable) as it no longer needs the thread inside {{StatisticsData}}. But since {{StatisticsData}} is technically a public class, it could be considered an API incompatible change. Thoughts on this? No one outside {{FileSystem}} is using the constructor or the member variable, and I cannot think of why anyone would. But if we need to absolutely maintain the API compatibility, I cannot remove them. Let me know what you think. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597146#comment-14597146 ] Hadoop QA commented on HADOOP-12107: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 5s | The applied patch generated 1 new checkstyle issues (total was 142, now 141). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 50s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 21m 48s | Tests passed in hadoop-common. | | | | 60m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741209/HADOOP-12107.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 99271b7 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/console | This message was automatically generated. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597153#comment-14597153 ] Sangjin Lee commented on HADOOP-12107: -- The checkstyle warning: {panel} ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:1: File length is 3,415 lines (max allowed is 2,000). {panel} Nothing I can do about that. :) Your review is greatly appreciated. Thanks! long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch, HADOOP-12107.003.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597058#comment-14597058 ] Hadoop QA commented on HADOOP-12107: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 18s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 5s | The applied patch generated 3 new checkstyle issues (total was 142, now 143). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 50s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 21m 56s | Tests passed in hadoop-common. | | | | 60m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741198/HADOOP-12107.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 99271b7 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/console | This message was automatically generated. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594570#comment-14594570 ] Hadoop QA commented on HADOOP-12107: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 26s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 4s | The applied patch generated 1 new checkstyle issues (total was 142, now 140). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 56s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 21m 55s | Tests passed in hadoop-common. | | | | 61m 3s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-common | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740755/HADOOP-12107.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 20c03c9 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/diffcheckstylehadoop-common.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/console | This message was automatically generated. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HADOOP-12107.001.patch We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12107) long running apps may have a huge number of StatisticsData instances under FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594054#comment-14594054 ] Sangjin Lee commented on HADOOP-12107: -- The key method in clearing this memory is in {{Statistics.visitAll()}}: {code} private synchronized T T visitAll(StatisticsAggregatorT visitor) { visitor.accept(rootData); if (allData != null) { for (IteratorStatisticsData iter = allData.iterator(); iter.hasNext(); ) { StatisticsData data = iter.next(); visitor.accept(data); if (data.owner.get() == null) { /* * If the thread that created this thread-local data no * longer exists, remove the StatisticsData from our list * and fold the values into rootData. */ rootData.add(data); iter.remove(); } } } return visitor.aggregate(); } {code} As part of running the visitor, it checks to see if the underlying thread is gone, and if so, adds the data for that thread to {{rootData}} and removes the instance from the list. This pattern almost literally cries out for using a {{PhantomReference}}. That way, we can perform this operation as soon as the garbage collector clears up the threads. I'll draw up a patch based on that idea soon. long running apps may have a huge number of StatisticsData instances under FileSystem - Key: HADOOP-12107 URL: https://issues.apache.org/jira/browse/HADOOP-12107 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.7.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of {{Statistics}}). Although the thread reference from {{StatisticsData}} is a weak reference, and thus can get cleared once a thread goes away, the actual {{StatisticsData}} instances in the list won't get cleared until any of these following methods is called on {{Statistics}}: - {{getBytesRead()}} - {{getBytesWritten()}} - {{getReadOps()}} - {{getLargeReadOps()}} - {{getWriteOps()}} - {{toString()}} It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the {{Statistics}}. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly. The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with {{FileSystem$Statistics}} itself in that the memory is controlled only as a side effect of those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)