Many thanks Pol.
As it happens I was doing a work around with numRows = 10. In general it
is bad practice to hard code the constants within the code. For the same
reason we ought not put URLs embedded in the PySpark program itself.
What I did was to add numRows to the yaml file which is read at
To me it looks like you are accessing "self" on the workers by using
"self.numRows" inside the map. As a consequence, "self" needs to be
serialized which has an attribute referencing the "sparkContext", thus
trying to serialize the context and failing.
It can be solved in different ways, for insta
Hi,
I define class instance variable self.numRows = 10 to be available to all
methods of this cls as below
class RandomData:
def __init__(self, spark_session, spark_context):
self.spark = spark_session
self.sc = spark_context
self.config = config
self.values =