[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658468591 OK, writing is disabled but old files can still be read ``` n [2]: pq.write_table(table, 'not_allowed.parquet.lz4', compression='lz4') --- OSError Traceback (most recent call last) in > 1 pq.write_table(table, 'not_allowed.parquet.lz4', compression='lz4') ~/code/arrow/python/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, write_statistics, use_deprecated_int96_timestamps, coerce_timestamps, allow_truncated_timestamps, data_page_size, flavor, filesystem, compression_level, use_byte_stream_split, data_page_version, **kwargs) 1632 data_page_version=data_page_version, 1633 **kwargs) as writer: -> 1634 writer.write_table(table, row_group_size=row_group_size) 1635 except Exception: 1636 if _is_path_like(where): ~/code/arrow/python/pyarrow/parquet.py in write_table(self, table, row_group_size) 586 raise ValueError(msg) 587 --> 588 self.writer.write_table(table, row_group_size=row_group_size) 589 590 def close(self): ~/code/arrow/python/pyarrow/_parquet.pyx in pyarrow._parquet.ParquetWriter.write_table() 1406 1407 with nogil: -> 1408 check_status(self.writer.get() 1409 .WriteTable(deref(ctable), c_row_group_size)) 1410 ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() 97 raise IOError(errno, message) 98 else: ---> 99 raise IOError(message) 100 elif status.IsOutOfMemory(): 101 raise ArrowMemoryError(message) OSError: Per ARROW-9424, writing files with LZ4 compression has been disabled until implementation issues have been resolved. It is recommended to read any existing files and rewrite them using a different compression. In ../src/parquet/arrow/writer.cc, line 684, code: WriteColumnChunk(table.column(i), offset, size) In [3]: pq.read_table('example.parquet.lz4').to_pandas() Out[3]: f0 0 1 1 2 2 3 3 4 4 5 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658438428 I don't recall but that may have been the case. Either way it's a giant mess since many people use pyarrow to write Parquet files to be consumed by JVM-based systems. I think we can infer that LZ4 is not often used from the fact that we haven't had more bug reports about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658437573 @pitrou @xhochy It seems that despite adding the LZ4_FRAME format we've been continuing to use LZ4_RAW for Parquet files. Unfortunate that this hasn't seen more compatibility testing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658431456 No problem, I can take it from here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658429763 @patrickpai do you anticipate to complete this today? We are hoping to cut a release candidate tomorrow during the workday Central Europe Time so I can help finish this if needed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
wesm commented on pull request #7757: URL: https://github.com/apache/arrow/pull/7757#issuecomment-658393753 Ah I see that you're adding Python changes. I fixed the lint problems here so be sure to rebase your changes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org