Hi,

I define class instance variable self.numRows = 10 to be available to all
methods of this cls as below

class RandomData:
    def __init__(self, spark_session, spark_context):
        self.spark = spark_session
        self.sc = spark_context
        self.config = config
        self.values = dict()
        *self.numRows = 10*

In another method of the same class, I use lambda function to generate
random values

    def generateRamdomData(self):
          rdd = self.sc.parallelize(Range). \
            map(lambda x: (x, uf.clustered(x, *self.numRows*), \

This fails with the error below

Could not serialize object: Exception: It appears that you are attempting
to reference SparkContext from a broadcast variable, action, or
transformation. SparkContext can only be used on the driver, not in code
that it run on workers. For more information, see SPARK-5063.

However this works if I assign self.numRows to a local variable in the that
method as below


       *numRows = self.numRows*
         rdd = self.sc.parallelize(Range). \
            map(lambda x: (x, uf.clustered(x, *numRows*), \



Any better explanation


Thanks


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to