Dmitry Kalinkin created ARROW-2571:
--------------------------------------

             Summary: [C++] Lz4Codec doesn't properly handle empty data
                 Key: ARROW-2571
                 URL: https://issues.apache.org/jira/browse/ARROW-2571
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Dmitry Kalinkin


For example a following closure test will fail:
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq

data = [pa.array([None] * 10)]
batch = pa.RecordBatch.from_arrays(data, ['x'])
table = pa.Table.from_batches([batch])
pq.write_table(table, "test.parquet", compression='LZ4')
table = pq.read_table("test.parquet")
{code}
with a following error
{code:java}
Traceback (most recent call last): File "test.py", line 8, in <module> table = 
pq.read_table("test.parquet") File 
"python3.6/site-packages/pyarrow/parquet.py", line 987, in read_table 
use_pandas_metadata=use_pandas_metadata) File 
"python3.6/site-packages/pyarrow/parquet.py", line 149, in read 
nthreads=nthreads) File "_parquet.pyx", line 736, in 
pyarrow._parquet.ParquetReader.read_all File "error.pxi", line 83, in 
pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Arrow error: IOError: 
Corrupt Lz4 compressed data.
{code}
Writing file from with LZ4 from python requires patch for ARROW-2570. But the 
issue can be reproduced by creating an input file with parquet-cpp. The file 
must be compressed with LZ4 and contain a column with only gap values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to