I'm getting a strange error when I try to use the result of a
RowMatrix.columnSimilarities call in pyspark. Hoping to get a second
opinion.

I'm somewhat new to spark - to me it looks like the RDD behind the
CoordinateMatrix returned by columnSimilarities() doesn't have a handle on
the spark context. Is there something I'm missing or might there be a bug in
how the result is translated back to python from the JVM?

I found a related post on StackOverflow but no responses yet:
https://stackoverflow.com/questions/44929009/collecting-pyspark-matrixs-entries-raise-a-weird-error-when-run-in-test?rq=1

Here's the pyspark documentation on columnSimilarities() (Its just a java /
scala function call)
http://spark.apache.org/docs/latest/api/python/_modules/pyspark/mllib/linalg/distributed.html#RowMatrix.columnSimilarities

*This snippet should reproduce the issue:*
--------
from pyspark.mllib.linalg.distributed import RowMatrix

rows = spark.sparkContext.parallelize([[0,1,2],[1,1,1]])
matrix = RowMatrix(rows)
sims = matrix.columnSimilarities()

print(sims.numRows(),sims.numCols()) #Prints correctly: "3 3"
print(sims.entries.collect()) #Error: 'NoneType' object has no attribute
'setCallSite'
--------

*Full stack trace of the Error:*
--------
AttributeError                            Traceback (most recent call last)
<ipython-input-45-e1d79c0da460> in <module>()
--> 1 sims.entries.collect()

/usr/lib/spark/python/pyspark/rdd.py in collect(self)
    821             to be small, as all the data is loaded into the driver's
memory.
    822         """
-->823         with SCCallSiteSync(self.context) as css:
    824             port =
self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
    825         return list(_load_from_socket(port,
self._jrdd_deserializer))

/usr/lib/spark/python/pyspark/traceback_utils.py in __enter__(self)
     70     def __enter__(self):
     71         if SCCallSiteSync._spark_stack_depth == 0:
-->72             self._context._jsc.setCallSite(self._call_site)
     73         SCCallSiteSync._spark_stack_depth += 1
     74 

AttributeError: 'NoneType' object has no attribute 'setCallSite'






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to