[ 
https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDFS-14492:
--------------------------------------

    Assignee: Wei-Chiu Chuang

> Snapshot memory leak
> --------------------
>
>                 Key: HDFS-14492
>                 URL: https://issues.apache.org/jira/browse/HDFS-14492
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 2.6.0
>         Environment: CDH5.14.4
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>
> We recently examined the NameNode heap dump of a big, heavy snapshot user, 
> trying to trim some fat, and surely enough we found memory leak in it: when 
> snapshots are removed, the corresponding data structures are not removed.
> This cluster has 586 million file system objects (286 million files, 287 
> million blocks, 13 million directories), using around 132gb of heap.
> While only 44.5 million files have snapshotted copies, 
> (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have 
> FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies 
> at some point in the past, but after snapshots are removed, those data 
> structured are still kept in the heap.
> INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, 
> FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in 
> large clusters like this. In this cluster, a whopping 13.8gb of memory could 
> have been saved:  ((32.5 + 32 + 24) bytes * (211997769 -  44572380) =~ 
> 13.8gb) if not for this bug. That is more than 10% of savings in heap size.
> Heap histogram for reference:
> {noformat}
> num #instances #bytes class name
>  ----------------------------------------------
>  1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile
>  2: 287322227 18388622528 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>  3: 227899550 17144816120 [B
>  4: 287324031 13769408616 
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo;
>  5: 71352116 12353841568 [Ljava.lang.Object;
>  6: 286322650 9170335840 
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>  7: 235632329 7658462416 
> [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>  8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>  9: 211997769 6783928608 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature
>  10: 211997769 5087946456 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList
>  11: 76586261 3780468856 [I
>  12: 44572380 3209211360 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy
>  13: 58634517 2345380680 java.util.ArrayList
>  14: 44572380 2139474240 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff
>  15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature
>  16: 12907668 1135874784 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat}
> [~szetszwo] [~arpaga] [~smeng] [~shashikant]  any thoughts?
> I am thinking that inside 
> AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file 
> diffs, it should also remove FileWithSnapshotFeature. I am not familiar with 
> the snapshot implementation, so any guidance is greatly appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to