[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15843 LGTM too Thanks a lot! Merging with master, branch-2.1, branch-2.0 Has anyone heard of complaints of this in current use cases of earlier branches? If not, I won't backport it further than 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15843 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley @holdenk @viirya PR updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15843 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69476/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15843 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15843 **[Test build #69476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69476/consoleFull)** for PR 15843 at commit [`37e83e8`](https://github.com/apache/spark/commit/37e83e88fe84dcfdcd04426c85499f7494cd688b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15843 **[Test build #69476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69476/consoleFull)** for PR 15843 at commit [`37e83e8`](https://github.com/apache/spark/commit/37e83e88fe84dcfdcd04426c85499f7494cd688b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley @holdenk will update the PR with changes today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15843 I agree, for a follow up (so we don't lose track of it) - I've created SPARK-18630 but option 1 for now is a strict improvement over the current situation. Thanks for all of your work on this @techaddict --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15843 Sorry for the slow response on this. Given the time pressure for 2.1, let's go with option 1 for now with a follow-up task to implement option 2. It would be great to include this fix in 2.1. @techaddict will you have time to update your PR quickly? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley @holdenk @techaddict Do we want to implement copy() in JavaWrapper in this PR too? Or separate it to follow ones with the required copy methods on JVM side? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15843 From the Py4J documentation it seems like we could be leaking memory with the first option, although perhaps not a lot of memory, but if it was being used in an iterative Python algorithm for training many models it could start to have some impact. I'd be in favor of option 2, but that could be done as a follow up issue if the required copy methods aren't generally available. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15843 I'd prefer option2 for safety since the model summaries should be an issue for GC. And looks like Java model summaries don't have copy method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15843 I don't see a need to do deep copies of model summaries, but I agree I don't like how JavaWrapper is ambiguous about whether it does shallow or deep copies. I'd say the confusion comes from us having a mix of immutable Java types (like model summaries) and mutable Java types (like Params subclasses). What do you think of these 2 options? * Distinguish mutability within Python wrappers: JavaWrapper is usable for immutable types. JavaParams (or other subtypes, if needed) is usable for mutable types. I.e., ```__del__``` and ```copy``` go in JavaParams. * Distinguish mutability within Java only: Use the same wrapper types for both in Python, and Java copy methods can do deep or shallow copies. I.e., in JavaWrapper, implement copy() which copies the Java instance, and implement ```__del__``` to release that instance's handle. I don't think either option does much for enforcing these semantics. Barring GC issues, I'd pick option 1 since it's simpler. But if option 2 is better for GC issues, then I'd vote for it. Thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15843 Oh. I thought JavaWrapper is only used on JavaParams. But there are also others like LogisticRegressionSummary which directly inherits JavaWrapper. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15843 I'm not so sure about that, we still would want to cleanup the underlying Java reference object on delete if it isn't needed anymore. I think the question is do we want to support shallow copy of javawrapper objects? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15843 @jkbradley Sounds making sense more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15843 I'm now wondering if ```__del___``` should be in JavaParams instead of JavaWrapper since JavaWrapper does not override ```copy```. Do yall agree it will be safer if it's moved there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org