Re: class instance variable in PySpark used in lambda function

2021-12-16 Thread Mich Talebzadeh
Many thanks Pol. As it happens I was doing a work around with numRows = 10. In general it is bad practice to hard code the constants within the code. For the same reason we ought not put URLs embedded in the PySpark program itself. What I did was to add numRows to the yaml file which is read at

Re: class instance variable in PySpark used in lambda function

2021-12-15 Thread Pol Santamaria
To me it looks like you are accessing "self" on the workers by using "self.numRows" inside the map. As a consequence, "self" needs to be serialized which has an attribute referencing the "sparkContext", thus trying to serialize the context and failing. It can be solved in different ways, for insta

class instance variable in PySpark used in lambda function

2021-12-15 Thread Mich Talebzadeh
Hi, I define class instance variable self.numRows = 10 to be available to all methods of this cls as below class RandomData: def __init__(self, spark_session, spark_context): self.spark = spark_session self.sc = spark_context self.config = config self.values =