[GitHub] spark issue #17282: [SPARK-19872][PYTHON] Use the correct deserializer for R...

majdou41 Fri, 09 Feb 2018 05:09:07 -0800

Github user majdou41 commented on the issue:

    https://github.com/apache/spark/pull/17282
  
    My code is :+1: 
sc.binatyFiles('hdfs://localhost:9000/user/majdouline/Training').repartition(90).collect()
    
    and i got this error :+1:  UTF8Deserializer(True)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/rdd.py", line 811, in collect
        return list(_load_from_socket(port, self._jrdd_deserializer))
      File ".../spark/python/pyspark/serializers.py", line 549, in load_stream
        yield self.loads(stream)
      File ".../spark/python/pyspark/serializers.py", line 544, in loads
        return s.decode("utf-8") if self.use_unicode else s
      File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py",
 line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: 
invalid start byte
    
    I had change rdd.py and serializers (version 2.1.0 to 2.0.2), but i got the 
same error 
    Can you help me please to fixe that .



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17282: [SPARK-19872][PYTHON] Use the correct deserializer for R...

Reply via email to