Github user keypointt commented on the issue:
https://github.com/apache/spark/pull/17451
hi @MLnick , I'm stuck when trying to add test cases for python
I tried below code chunk in pyspark terminal via `./bin/pyspark`
```
from pyspark.ml.feature import Word2Vec
sent = ("a b " * 100 + "a c " * 10).split(" ")
doc = spark.createDataFrame([(sent,), (sent,)], ["sentence"])
word2Vec = Word2Vec(vectorSize=5, seed=42, inputCol="sentence",
outputCol="model")
model = word2Vec.fit(doc)
model.findSynonyms("a", 2)
model.findSynonymsArray("a", 2)
```
and for `findSynonyms()`, I got results as expected:
```
>>> model.findSynonyms("a", 2)
hahaha: Dataset
JavaObject id=o143
DataFrame[word: string, similarity: double]
```
but for `findSynonymsArray()` I got below, which has no data
```
>>> model.findSynonymsArray("a", 2)
[{u'__class__': u'scala.Tuple2'}, {u'__class__': u'scala.Tuple2'}]
```
I tried to debug and found `r` is in `elif isinstance(r, (JavaArray,
JavaList)):` and dumped directly. It seems `Py4J` is not handling the returned
object
properly?https://github.com/apache/spark/blob/master/python/pyspark/ml/common.py#L90
could you please give me a hint here? I'm now trying to dig more into Py4J
but it could take me some time. Thank you very much
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]