Github user majdou41 commented on the issue:
https://github.com/apache/spark/pull/17282
My code is :+1:
sc.binatyFiles('hdfs://localhost:9000/user/majdouline/Training').repartition(90).collect()
and i got this error :+1: UTF8Deserializer(True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../spark/python/pyspark/rdd.py", line 811, in collect
return list(_load_from_socket(port, self._jrdd_deserializer))
File ".../spark/python/pyspark/serializers.py", line 549, in load_stream
yield self.loads(stream)
File ".../spark/python/pyspark/serializers.py", line 544, in loads
return s.decode("utf-8") if self.use_unicode else s
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py",
line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
invalid start byte
I had change rdd.py and serializers (version 2.1.0 to 2.0.2), but i got the
same error
Can you help me please to fixe that .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]