Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14333
Merged to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or i
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/14333
@srowen
I check code again, the problem I mentioned above
`But now I found another problem in BisectKMeans:
in line 191 there is a iteration it also need this pattern âpersist
cur
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14333
If the last problem is really pretty related to this code, then it should
change here as well. However if you're not sure there's an easy fix, we can
leave it for later. Are you comfortable that the
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/14333
@srowen yeah, the code logic here seems confusing, but I think it is right.
Now I can explain it in a clear way:
in essence, the logic can be expressed as following:
A0->I1->A1->I2->A
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14333
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62908/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14333
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14333
**[Test build #62908 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62908/consoleFull)**
for PR 14333 at commit
[`7f042a2`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14333
**[Test build #62908 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62908/consoleFull)**
for PR 14333 at commit
[`7f042a2`](https://github.com/apache/spark/commit/7
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14333
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14333
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62907/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14333
**[Test build #62907 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62907/consoleFull)**
for PR 14333 at commit
[`dc17da8`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14333
**[Test build #62907 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62907/consoleFull)**
for PR 14333 at commit
[`dc17da8`](https://github.com/apache/spark/commit/d
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/14333
@srowen
I check `RDD.persist` referenced place:
AFTSuvivalRegression, LinearRegression, LogisticRegression, will persist
input training RDD and unpersist them when `train` return, seems
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14333
Yes, I suppose the issue is consistency ... there are loads of places where
RDDs and broadcasts aren't really cleaned up properly in the code. Maybe it's
fine to take extra steps here to at least ens
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/14333
@srowen
The sparkContext, by default, will running a cleaner to release not
referenced RDD/broadcasts on background. But, I think, we'd better to release
them by ourselves because the Spar
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14333
OK sounds good. So maybe we're back to this: for `bcNewCenters`, is it
really worth the overhead to track and destroy them? or just settle for
unpersisting within the loop? I could go either way, you
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/14333
yeah, but the `bcSyn0Global` in Word2Vec is a difference case, it looks
safe there to destroy,
because in each loop iteration, the RDD transform which use `bcSyn0Global`
ends with a `colle
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14333
Yeah, I think you're right, because the unpersisted RDD can still be
recomputed but not a destroyed Broadcast. Hm, then isn't this also true of
`bcSyn0Global`?
I suppose I think we should pr
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/14333
@srowen The KMeans.initKMeansParallel already implements the pattern
"persist current step RDD, and unpersist previous one", but I think an RDD
persisted can also break down because of disk err
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14333
Oh, it is indeed building up a lineage. I think it's easier to leave this
broadcast as-is then unless we know that destroying them is essential for
reclaiming driver resources.
Here's anothe
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/14333
@srowen
I check the code about KMean `bcNewCenters` again, if we want to make sure
the recovery of RDD will successful in any unexcepted case, we have to keep
all the `bcNewCenters` genera
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14333
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14333
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62793/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14333
**[Test build #62793 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62793/consoleFull)**
for PR 14333 at commit
[`01f4d3a`](https://github.com/apache/spark/commit/
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14333
I don't understand the last change. As far as I can see it can be destroyed
inside the loop iteration. It's also possible to reuse the broadcast (declare
outside the loop), and unpersist each iterati
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14333
**[Test build #62793 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62793/consoleFull)**
for PR 14333 at commit
[`01f4d3a`](https://github.com/apache/spark/commit/0
26 matches
Mail list logo