[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15843
  
LGTM too
Thanks a lot!
Merging with master, branch-2.1, branch-2.0

Has anyone heard of complaints of this in current use cases of earlier 
branches?  If not, I won't backport it further than 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15843
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley @holdenk @viirya PR updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15843
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69476/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15843
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15843
  
**[Test build #69476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69476/consoleFull)**
 for PR 15843 at commit 
[`37e83e8`](https://github.com/apache/spark/commit/37e83e88fe84dcfdcd04426c85499f7494cd688b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15843
  
**[Test build #69476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69476/consoleFull)**
 for PR 15843 at commit 
[`37e83e8`](https://github.com/apache/spark/commit/37e83e88fe84dcfdcd04426c85499f7494cd688b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-30 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley @holdenk will update the PR with changes today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-29 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15843
  
I agree, for a follow up (so we don't lose track of it) - I've created 
SPARK-18630 but option 1 for now is a strict improvement over the current 
situation. Thanks for all of your work on this @techaddict


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-28 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15843
  
Sorry for the slow response on this.  Given the time pressure for 2.1, 
let's go with option 1 for now with a follow-up task to implement option 2.  It 
would be great to include this fix in 2.1.

@techaddict will you have time to update your PR quickly?  Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-21 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley @holdenk @techaddict Do we want to implement copy() in 
JavaWrapper in this PR too? Or separate it to follow ones with the required 
copy methods on JVM side?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-15 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15843
  
From the Py4J documentation it seems like we could be leaking memory with 
the first option, although perhaps not a lot of memory, but if it was being 
used in an iterative Python algorithm for training many models it could start 
to have some impact. I'd be in favor of option 2, but that could be done as a 
follow up issue if the required copy methods aren't generally available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15843
  
I'd prefer option2 for safety since the model summaries should be an issue 
for GC. And looks like Java model summaries don't have copy method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-14 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15843
  
I don't see a need to do deep copies of model summaries, but I agree I 
don't like how JavaWrapper is ambiguous about whether it does shallow or deep 
copies.

I'd say the confusion comes from us having a mix of immutable Java types 
(like model summaries) and mutable Java types (like Params subclasses).  What 
do you think of these 2 options?
* Distinguish mutability within Python wrappers: JavaWrapper is usable for 
immutable types.  JavaParams (or other subtypes, if needed) is usable for 
mutable types.  I.e., ```__del__``` and ```copy``` go in JavaParams.
* Distinguish mutability within Java only: Use the same wrapper types for 
both in Python, and Java copy methods can do deep or shallow copies.  I.e., in 
JavaWrapper, implement copy() which copies the Java instance, and implement 
```__del__``` to release that instance's handle.

I don't think either option does much for enforcing these semantics.  
Barring GC issues, I'd pick option 1 since it's simpler.  But if option 2 is 
better for GC issues, then I'd vote for it.

Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-12 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15843
  
Oh. I thought JavaWrapper is only used on JavaParams. But there are also 
others like LogisticRegressionSummary which directly inherits JavaWrapper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-12 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15843
  
I'm not so sure about that, we still would want to cleanup the underlying 
Java reference object on delete if it isn't needed anymore. I think the 
question is do we want to support shallow copy of javawrapper objects?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley Sounds making sense more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-11 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15843
  
I'm now wondering if ```__del___``` should be in JavaParams instead of 
JavaWrapper since JavaWrapper does not override ```copy```.  Do yall agree it 
will be safer if it's moved there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org