|
down votefavorite | spark2.1.1 & python2.7.11 I want to union another rdd in Dstream.transform() like below: sc = SparkContext() ssc = StreamingContext(sc, 1) init_rdd = sc.textFile('file:///home/zht/PycharmProjects/test/text_file.txt') lines = ssc.socketTextStream('localhost', 9999) lines = lines.transform(lambda rdd: rdd.union(init_rdd)) lines.pprint() ssc.start() ssc.awaitTermination() And get error about pickle: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main process() File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 268, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1338, in takeUpToNumLeft File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 144, in load_stream yield self._read_with_length(stream) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length return self.loads(obj) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 454, in loads return pickle.loads(obj) UnpicklingError: unpickling stack underflow text_file.txt only include one letter : a |