Benjamin Duffield created ARROW-2307: ----------------------------------------
Summary: Unable to read arrow stream containing 0 record batches using pyarrow Key: ARROW-2307 URL: https://issues.apache.org/jira/browse/ARROW-2307 Project: Apache Arrow Issue Type: Bug Components: C, Python Affects Versions: 0.8.0 Reporter: Benjamin Duffield Using java arrow I'm creating an arrow stream, using the stream writer. Sometimes I don't have anything to serialize, and so I don't write any record batches. My arrow stream thus consists of just a schema message. {code:java} <SCHEMA> <EOS [optional]: int32> {code} I am able to deserialize this arrow stream correctly using the java stream reader, but when reading it with python I instead hit an error {code} import pyarrow as pa # ... reader = pa.open_stream(stream) df = reader.read_all().to_pandas() {code} produces {code} File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all File "error.pxi", line 77, in pyarrow.lib.check_status ArrowInvalid: Must pass at least one record batch {code} i.e. we're hitting the check in https://github.com/apache/arrow/blob/apache-arrow-0.8.0/cpp/src/arrow/table.cc#L284 The workaround we're currently using is to always ensure we serialize at least one record batch, even if it's empty. However, I think it would be nice to either support a stream without record batches or explicitly disallow this and then match behaviour in java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)