linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to
hadoop fs caching mechanism when eventLog is enabled
URL: https://github.com/apache/spark/pull/24461#issuecomment-488114925
@vanzin
the dumps shows a HashSet under `FileSystem$Statistics` that's holding up
lots
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to
hadoop fs caching mechanism when eventLog is enabled
URL: https://github.com/apache/spark/pull/24461#issuecomment-488058505
yea, we run multiple contexts in sequence and that works well.
however looking at this
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to
hadoop fs caching mechanism when eventLog is enabled
URL: https://github.com/apache/spark/pull/24461#issuecomment-488024150
@vanzin enable eventLog itself along does not cause problem. like long
running streaming
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to
hadoop fs caching mechanism when eventLog is enabled
URL: https://github.com/apache/spark/pull/24461#issuecomment-487992657
and also, we've tested this with and without eventLog enabled. and the
result is clear
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak
URL: https://github.com/apache/spark/pull/24461#issuecomment-487990769
looked more into the hadoop side, `FileSystem.get(URI uri, Configuration
conf)` has caching built in side, unless specifically disabled.
`return
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak
URL: https://github.com/apache/spark/pull/24461#issuecomment-48740
@srowen
if there is indeed somewhere else that's holding that filesystem object,
them this won't solve the problem you are right. but if not,
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak
URL: https://github.com/apache/spark/pull/24461#issuecomment-487411885
@srowen filesystem is shared that's true, but it's shared within a spark
context, so if the context.close() is called, I think we can safely remove