Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/15821
the test currently fails with the stack trace
```
Traceback (most recent call last):
File "/home/bryan/git/spark/python/pyspark/sql/tests.py", line 2000, in
test_arrow_round_trip
pdf_arrow = df.toPandas(useArrow=True)
File "/home/bryan/git/spark/python/pyspark/sql/dataframe.py", line 1535,
in toPandas
return self.collectAsArrow().to_pandas()
File "/home/bryan/git/spark/python/pyspark/sql/dataframe.py", line 353,
in collectAsArrow
return list(_load_from_socket(port, ArrowSerializer()))[0]
File "/home/bryan/git/spark/python/pyspark/rdd.py", line 140, in
_load_from_socket
for item in serializer.load_stream(rf):
File "/home/bryan/git/spark/python/pyspark/serializers.py", line 139, in
load_stream
yield self._read_with_length(stream)
File "/home/bryan/git/spark/python/pyspark/serializers.py", line 164, in
_read_with_length
return self.loads(obj)
File "/home/bryan/git/spark/python/pyspark/serializers.py", line 192, in
loads
return reader.get_record_batch(0)
File "pyarrow/ipc.pyx", line 150, in
pyarrow.ipc.ArrowFileReader.get_record_batch
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/ipc.cxx:3074)
check_status(self.reader.get().GetRecordBatch(i, &batch))
File "pyarrow/error.pyx", line 31, in pyarrow.error.check_status
(/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/error.cxx:1185)
raise ArrowException(frombytes(c_message))
ArrowException: Invalid: metadata size invalid
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]