[GitHub] [spark] linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled

2019-04-30 Thread GitBox
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled URL: https://github.com/apache/spark/pull/24461#issuecomment-488114925 @vanzin the dumps shows a HashSet under `FileSystem$Statistics` that's holding up lots

[GitHub] [spark] linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled

2019-04-30 Thread GitBox
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled URL: https://github.com/apache/spark/pull/24461#issuecomment-488058505 yea, we run multiple contexts in sequence and that works well. however looking at this

[GitHub] [spark] linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled

2019-04-30 Thread GitBox
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled URL: https://github.com/apache/spark/pull/24461#issuecomment-488024150 @vanzin enable eventLog itself along does not cause problem. like long running streaming

[GitHub] [spark] linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled

2019-04-30 Thread GitBox
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak due to hadoop fs caching mechanism when eventLog is enabled URL: https://github.com/apache/spark/pull/24461#issuecomment-487992657 and also, we've tested this with and without eventLog enabled. and the result is clear

[GitHub] [spark] linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak

2019-04-30 Thread GitBox
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak URL: https://github.com/apache/spark/pull/24461#issuecomment-487990769 looked more into the hadoop side, `FileSystem.get(URI uri, Configuration conf)` has caching built in side, unless specifically disabled. `return

[GitHub] [spark] linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak

2019-04-28 Thread GitBox
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak URL: https://github.com/apache/spark/pull/24461#issuecomment-48740 @srowen if there is indeed somewhere else that's holding that filesystem object, them this won't solve the problem you are right. but if not,

[GitHub] [spark] linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak

2019-04-28 Thread GitBox
linehrr commented on issue #24461: [SPARK-27434][CORE] Fix mem leak URL: https://github.com/apache/spark/pull/24461#issuecomment-487411885 @srowen filesystem is shared that's true, but it's shared within a spark context, so if the context.close() is called, I think we can safely remove