Hi all, I try to create new object in the map function. But pyspark report a lot of error information. Is it legal to do so? Here is my codes:
class Node(object): def __init__(self, A, B, C): self.A = A self.B = B self.C = C def make_vertex(pair): A, (B, C) = pair return Node(A, B, C) dictionary = {'PYTHONPATH':'/home/grad02/xss/opt/old'} sc = SparkContext("spark://zzz:7077", "test job", environment = dictionary ) rdd = sc.parallelize([[1,(2, 3) ]]) noMap = make_vertex([1, (2, 3)]) print(noMap.A) myRdd = rdd.map(make_vertex) result = myRdd.collect() Could anybody tell me whether create a new object in a map function in pyspark is legal? Thanks, Kaden