Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20520#discussion_r167082120
--- Diff: python/pyspark/ml/tests.py ---
@@ -1620,6 +1621,23 @@ def test_kmeans_summary(self):
self.assertEqual(s.k, 2)
+class KMeansTests(SparkSessionTestCase):
+
+ def test_kmeans_cosine_distance(self):
+ data = [(Vectors.dense([1.0, 1.0]),), (Vectors.dense([10.0,
10.0]),),
+ (Vectors.dense([1.0, 0.5]),), (Vectors.dense([10.0,
4.4]),),
+ (Vectors.dense([-1.0, 1.0]),), (Vectors.dense([-100.0,
90.0]),)]
+ df = self.spark.createDataFrame(data, ["features"])
+ kmeans = KMeans(k=3, seed=1)
+ kmeans.setDistanceMeasure("cosine")
--- End diff --
it was just to test that this method is working. Do you think it is better
to switch to what you suggested?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]