Re: GradientBoostTrees leaks a persisted RDD

2015-04-23 Thread Joseph Bradley
I saw the PR already, but only saw this just now. I think both persists are useful based on my experience, but it's very hard to say in general. On Thu, Apr 23, 2015 at 12:22 PM, jimfcarroll wrote: > > Okay. > > PR: https://github.com/apache/spark/pull/5669 > > Jira: https://issues.apache.org/j

Re: GradientBoostTrees leaks a persisted RDD

2015-04-23 Thread jimfcarroll
Okay. PR: https://github.com/apache/spark/pull/5669 Jira: https://issues.apache.org/jira/browse/SPARK-7100 Hope that helps. Let me know if you need anything else. Jim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/GradientBoostTrees-leaks-a-per

Re: GradientBoostTrees leaks a persisted RDD

2015-04-23 Thread Sean Owen
Those are different RDDs that DecisionTree persists, though. It's not redundant. On Thu, Apr 23, 2015 at 11:12 AM, jimfcarroll wrote: > Hi Sean and Joe, > > I have another question. > > GradientBoostedTrees.run iterates over the RDD calling DecisionTree.run on > each iteration with a new random s

Re: GradientBoostTrees leaks a persisted RDD

2015-04-23 Thread jimfcarroll
Hi Sean and Joe, I have another question. GradientBoostedTrees.run iterates over the RDD calling DecisionTree.run on each iteration with a new random sample from the input RDD. DecisionTree.run calls RandomForest.run. which also calls persist. One of these seems superfluous. Should I simply re

Re: GradientBoostTrees leaks a persisted RDD

2015-04-23 Thread Sean Owen
Only against master; it can be cherry-picked to other branches. On Thu, Apr 23, 2015 at 10:53 AM, jimfcarroll wrote: > Hi Joe, > > Do you want a PR per branch (one for master, one for 1.3)? Are you still > maintaining 1.2? Do you need a Jira ticket per PR or can I submit them all > under the same

Re: GradientBoostTrees leaks a persisted RDD

2015-04-23 Thread jimfcarroll
Hi Joe, Do you want a PR per branch (one for master, one for 1.3)? Are you still maintaining 1.2? Do you need a Jira ticket per PR or can I submit them all under the same ticket? Or should I just submit it to master and let you guys back-port it? Jim -- View this message in context: http://

Re: GradientBoostTrees leaks a persisted RDD

2015-04-22 Thread Joseph Bradley
Hi Jim, You're right; that should be unpersisted. Could you please create a JIRA and submit a patch? Thanks! Joseph On Wed, Apr 22, 2015 at 6:00 PM, jimfcarroll wrote: > Hi all, > > It appears GradientBoostedTrees.scala can call 'persist' on an RDD and > never > unpersist it. In the master br