[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

HyukjinKwon Tue, 03 Jan 2017 01:19:32 -0800

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16429
  
    Thanks for your interests @azmras.  I just checked it as below:
    
    ```python
    sc.parallelize(range(100), 8)
    ```
    
    ```
    Traceback (most recent call last):
      File ".../spark/python/pyspark/cloudpickle.py", line 107, in dump
        return Pickler.dump(self, obj)
      File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
 line 409, in dump
        self.save(obj)
      File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
 line 476, in save
        f(self, obj) # Call unbound method with explicit self
      File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
 line 751, in save_tuple
        save(element)
      File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
 line 476, in save
        f(self, obj) # Call unbound method with explicit self
      File ".../spark/python/pyspark/cloudpickle.py", line 214, in save_function
        self.save_function_tuple(obj)
      File ".../spark/python/pyspark/cloudpickle.py", line 244, in 
save_function_tuple
        code, f_globals, defaults, closure, dct, base_globals = 
self.extract_func_data(func)
      File ".../spark/python/pyspark/cloudpickle.py", line 306, in 
extract_func_data
        func_global_refs = self.extract_code_globals(code)
      File ".../spark/python/pyspark/cloudpickle.py", line 288, in 
extract_code_globals
        out_names.add(names[oparg])
    IndexError: tuple index out of range
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/rdd.py", line 198, in __repr__
        return self._jrdd.toString()
      File ".../spark/python/pyspark/rdd.py", line 2438, in _jrdd
        self._jrdd_deserializer, profiler)
      File ".../spark/python/pyspark/rdd.py", line 2371, in _wrap_function
        pickled_command, broadcast_vars, env, includes = 
_prepare_for_python_RDD(sc, command)
      File ".../spark/python/pyspark/rdd.py", line 2357, in 
_prepare_for_python_RDD
        pickled_command = ser.dumps(command)
      File ".../spark/python/pyspark/serializers.py", line 452, in dumps
        return cloudpickle.dumps(obj, 2)
      File ".../spark/python/pyspark/cloudpickle.py", line 667, in dumps
        cp.dump(obj)
      File ".../spark/python/pyspark/cloudpickle.py", line 115, in dump
        if "'i' format requires" in e.message:
    AttributeError: 'IndexError' object has no attribute 'message'
    ```
    
    It looks another issue with Python 3.6.0. This is only related with the 
hijacked `collections.namedtuple`.
    
    We should port 
https://github.com/cloudpipe/cloudpickle/commit/4945361c2db92095f934b92a6c00316243caf3cc.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

Reply via email to