[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

BryanCutler Tue, 08 Nov 2016 17:18:30 -0800

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/15821
  
    the test currently fails with the stack trace
    ```
    Traceback (most recent call last):
      File "/home/bryan/git/spark/python/pyspark/sql/tests.py", line 2000, in 
test_arrow_round_trip
        pdf_arrow = df.toPandas(useArrow=True)
      File "/home/bryan/git/spark/python/pyspark/sql/dataframe.py", line 1535, 
in toPandas
        return self.collectAsArrow().to_pandas()
      File "/home/bryan/git/spark/python/pyspark/sql/dataframe.py", line 353, 
in collectAsArrow
        return list(_load_from_socket(port, ArrowSerializer()))[0]
      File "/home/bryan/git/spark/python/pyspark/rdd.py", line 140, in 
_load_from_socket
        for item in serializer.load_stream(rf):
      File "/home/bryan/git/spark/python/pyspark/serializers.py", line 139, in 
load_stream
        yield self._read_with_length(stream)
      File "/home/bryan/git/spark/python/pyspark/serializers.py", line 164, in 
_read_with_length
        return self.loads(obj)
      File "/home/bryan/git/spark/python/pyspark/serializers.py", line 192, in 
loads
        return reader.get_record_batch(0)
      File "pyarrow/ipc.pyx", line 150, in 
pyarrow.ipc.ArrowFileReader.get_record_batch 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/ipc.cxx:3074)
        check_status(self.reader.get().GetRecordBatch(i, &batch))
      File "pyarrow/error.pyx", line 31, in pyarrow.error.check_status 
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/error.cxx:1185)
        raise ArrowException(frombytes(c_message))
    ArrowException: Invalid: metadata size invalid
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

Reply via email to