[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-30 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14333 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or i

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-30 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen I check code again, the problem I mentioned above `But now I found another problem in BisectKMeans: in line 191 there is a iteration it also need this pattern “persist cur

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14333 If the last problem is really pretty related to this code, then it should change here as well. However if you're not sure there's an easy fix, we can leave it for later. Are you comfortable that the

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-27 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen yeah, the code logic here seems confusing, but I think it is right. Now I can explain it in a clear way: in essence, the logic can be expressed as following: A0->I1->A1->I2->A

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62908/ Test PASSed. ---

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62908/consoleFull)** for PR 14333 at commit [`7f042a2`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62908/consoleFull)** for PR 14333 at commit [`7f042a2`](https://github.com/apache/spark/commit/7

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62907/ Test FAILed. ---

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62907/consoleFull)** for PR 14333 at commit [`dc17da8`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62907/consoleFull)** for PR 14333 at commit [`dc17da8`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen I check `RDD.persist` referenced place: AFTSuvivalRegression, LinearRegression, LogisticRegression, will persist input training RDD and unpersist them when `train` return, seems

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14333 Yes, I suppose the issue is consistency ... there are loads of places where RDDs and broadcasts aren't really cleaned up properly in the code. Maybe it's fine to take extra steps here to at least ens

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen The sparkContext, by default, will running a cleaner to release not referenced RDD/broadcasts on background. But, I think, we'd better to release them by ourselves because the Spar

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14333 OK sounds good. So maybe we're back to this: for `bcNewCenters`, is it really worth the overhead to track and destroy them? or just settle for unpersisting within the loop? I could go either way, you

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 yeah, but the `bcSyn0Global` in Word2Vec is a difference case, it looks safe there to destroy, because in each loop iteration, the RDD transform which use `bcSyn0Global` ends with a `colle

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14333 Yeah, I think you're right, because the unpersisted RDD can still be recomputed but not a destroyed Broadcast. Hm, then isn't this also true of `bcSyn0Global`? I suppose I think we should pr

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen The KMeans.initKMeansParallel already implements the pattern "persist current step RDD, and unpersist previous one", but I think an RDD persisted can also break down because of disk err

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14333 Oh, it is indeed building up a lineage. I think it's easier to leave this broadcast as-is then unless we know that destroying them is essential for reclaiming driver resources. Here's anothe

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen I check the code about KMean `bcNewCenters` again, if we want to make sure the recovery of RDD will successful in any unexcepted case, we have to keep all the `bcNewCenters` genera

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14333 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62793/ Test PASSed. ---

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62793/consoleFull)** for PR 14333 at commit [`01f4d3a`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14333 I don't understand the last change. As far as I can see it can be destroyed inside the loop iteration. It's also possible to reuse the broadcast (declare outside the loop), and unpersist each iterati

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14333 **[Test build #62793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62793/consoleFull)** for PR 14333 at commit [`01f4d3a`](https://github.com/apache/spark/commit/0