Github user keypointt commented on the issue:
https://github.com/apache/spark/pull/17451
```
>>> from pyspark.ml.feature import Word2Vec
>>> sent = ("a b " * 100 + "a c " * 10).split(" ")
>>> doc = spark.createDataFrame([(sent,), (sent,)], ["sentence"])
>>> word2Vec = Word2Vec(vectorSize=5, seed=42, inputCol="sentence",
outputCol="model")
>>> model = word2Vec.fit(doc)
```
above is the setup, and I created the `vec` below. It's fitting in
`model.findSynonyms` nicely
```
>>> from pyspark.ml.linalg import Vectors
>>> vec = Vectors.dense([0.267, -0.2691, 0.058, -0.0801, 0.1821, 0.4162,
0.0259, -0.2163, 0.1787, 0.0764])
>>> model.findSynonyms(vec, 2)
DataFrame[word: string, similarity: double]
```
but `vec` cannot fit in `model.findSynonymsArray` even its type is `<class
'pyspark.ml.linalg.DenseVector'>`
```
>>> model.findSynonymsArray(vec, 2)
word:
[0.267,-0.2691,0.058,-0.0801,0.1821,0.4162,0.0259,-0.2163,0.1787,0.0764]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Users/renxin/Documents/workspace/spark/python/pyspark/ml/feature.py", line
2951, in findSynonymsArray
tuples = self._java_obj.findSynonymsArray(word, num)
File
"/Users/renxin/Documents/workspace/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py",
line 1160, in __call__
File
"/Users/renxin/Documents/workspace/spark/python/pyspark/sql/utils.py", line 63,
in deco
return f(*a, **kw)
File
"/Users/renxin/Documents/workspace/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py",
line 324, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling
o65.findSynonymsArray. Trace:
py4j.Py4JException: Method findSynonymsArray([class java.util.ArrayList,
class java.lang.Integer]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
>>> type(vec)
<class 'pyspark.ml.linalg.DenseVector'>
```
here `vec` is taken as `java.util.ArrayList`
does `self._java_obj.findSynonymsArray(word, num)` behave differently from
`self._call_java("findSynonyms", word, num)` for Vector type?
thank you Holden ð
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]