linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled URL: https://github.com/apache/spark/pull/24461#issuecomment-488114925 @vanzin the dumps shows a HashSet under `FileSystem$Statistics` that's holding up lots of mem. and from the class definition, there is only one HashSet under that class. also I think the HashMap there is just misleading, because in HashSet's implementation, it uses a HashMap for lookup: ``` public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { static final long serialVersionUID = -5024744406713321676L; private transient HashMap<E,Object> map; /** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() { map = new HashMap<>(); } ``` so that HashMap is just under the HashSet holding internal data structure. I can look at the dump again, but I don't know if I can see more useful stuff. I will post back if I find anything new. but I agree, this is less likely to be a Spark issue at this moment.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org