Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/16654
gentle ping @zhengruifeng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and w
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16654
@zhengruifeng don't most ML libraries have separate clustering evaluators?
For example, WEKA has ClusterEvalution class. Scikit-learn just has a metrics
class and functions you can call, but
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/16654
@srowen I agree that metric should be irrelevant to details of the
algorithms. AUC is irrelevant to algorithms, it is just relevant to the
dataset: In spark-ml, scikit-learn, or any other packa
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16654
Sure, and classification metrics like AUC only make sense for classifiers
that output more than just a label -- they have to output a probability or
score of some kind. Not every metric necessarily m
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/16654
Existing metrics (WSSSE,Loglikelihood) are relevant to detail of algorithm.
Computation of WSSSE for KMeans/BisectKMeans use the average vectors as the
centers, but for KMedoids the medoids, ot
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16654
Also, if some metrics are only applicable to some models, as srowen noted,
we can either make separate evaluator classes or put all metrics on one but
throw if the model does not support that
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16654
Wouldn't we eventually want to add a lot more clustering metrics like Dunn,
Davies-Bouldin, Simplified Silhouette etc... there are a lot of clustering
metrics and it seems like a good idea to
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16654
Metrics evaluate the clustering though; the details of the algorithm are
irrelevant. This still clusters points in a continuous space so you can measure
WSSSE.
---
If your project is set up for it,
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/16654
@srowen The concept of `center` don't exist in DBSCAN.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71805/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71805 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71805/testReport)**
for PR 16654 at commit
[`5937ce7`](https://github.com/apache/spark/commit/5
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16654
I agree that clustering metrics are different from classification metrics,
but that doesn't mean they can't have some common abstraction -- they're
applied to a model and data set and produce a numbe
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/16654
@srowen I think I had not clarify my thoughts. WSSSE and Loglikelihood are
algorithm-specific metrics.
For example:
WSSSE dont make sense for clustering algorithms like DBSCAN,
GMM's
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16654
Yes, I think this is at best a duplicate of SPARK-14516. You don't want to
add ad-hoc methods for this.
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71805 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71805/testReport)**
for PR 16654 at commit
[`5937ce7`](https://github.com/apache/spark/commit/59
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/16654
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled an
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71799/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71799 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71799/testReport)**
for PR 16654 at commit
[`5937ce7`](https://github.com/apache/spark/commit/59
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/16654
I think now clustering metrics are not that general, comparing with
classification/regression metrics:
WSSSE only apply to `KMeans` and `BiKMeans`
Loglikelihood only apply to `GMM`
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16654
+1 with @srowen , this should be limited to the evaluator/metrics classes.
If we have an evaluator for clustering then will we be able to use it with
hyperparameter tuner (cross validate)?
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71717/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71717 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71717/testReport)**
for PR 16654 at commit
[`1d89914`](https://github.com/apache/spark/commit/1
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16654
General question: isn't this what Evaluators are for?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have thi
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71717 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71717/testReport)**
for PR 16654 at commit
[`1d89914`](https://github.com/apache/spark/commit/1d
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71710/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71710 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71710/testReport)**
for PR 16654 at commit
[`29bda3f`](https://github.com/apache/spark/commit/2
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71710 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71710/testReport)**
for PR 16654 at commit
[`29bda3f`](https://github.com/apache/spark/commit/29
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/16654
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled an
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71707/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71707 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71707/testReport)**
for PR 16654 at commit
[`29bda3f`](https://github.com/apache/spark/commit/29
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16654
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71698/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71698 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71698/testReport)**
for PR 16654 at commit
[`bb01219`](https://github.com/apache/spark/commit/b
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16654
**[Test build #71698 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71698/testReport)**
for PR 16654 at commit
[`bb01219`](https://github.com/apache/spark/commit/bb
39 matches
Mail list logo