[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658468591


   OK, writing is disabled but old files can still be read
   
   ```
   n [2]: pq.write_table(table, 'not_allowed.parquet.lz4', compression='lz4')   

 
   ---
   OSError   Traceback (most recent call last)
in 
   > 1 pq.write_table(table, 'not_allowed.parquet.lz4', compression='lz4')
   
   ~/code/arrow/python/pyarrow/parquet.py in write_table(table, where, 
row_group_size, version, use_dictionary, compression, write_statistics, 
use_deprecated_int96_timestamps, coerce_timestamps, allow_truncated_timestamps, 
data_page_size, flavor, filesystem, compression_level, use_byte_stream_split, 
data_page_version, **kwargs)
  1632 data_page_version=data_page_version,
  1633 **kwargs) as writer:
   -> 1634 writer.write_table(table, row_group_size=row_group_size)
  1635 except Exception:
  1636 if _is_path_like(where):
   
   ~/code/arrow/python/pyarrow/parquet.py in write_table(self, table, 
row_group_size)
   586 raise ValueError(msg)
   587 
   --> 588 self.writer.write_table(table, row_group_size=row_group_size)
   589 
   590 def close(self):
   
   ~/code/arrow/python/pyarrow/_parquet.pyx in 
pyarrow._parquet.ParquetWriter.write_table()
  1406 
  1407 with nogil:
   -> 1408 check_status(self.writer.get()
  1409  .WriteTable(deref(ctable), 
c_row_group_size))
  1410 
   
   ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
97 raise IOError(errno, message)
98 else:
   ---> 99 raise IOError(message)
   100 elif status.IsOutOfMemory():
   101 raise ArrowMemoryError(message)
   
   OSError: Per ARROW-9424, writing files with LZ4 compression has been 
disabled until implementation issues have been resolved. It is recommended to 
read any existing files and rewrite them using a different compression.
   In ../src/parquet/arrow/writer.cc, line 684, code: 
WriteColumnChunk(table.column(i), offset, size)
   
   In [3]: pq.read_table('example.parquet.lz4').to_pandas() 

  
   Out[3]: 
  f0
   0   1
   1   2
   2   3
   3   4
   4   5
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658438428


   I don't recall but that may have been the case. Either way it's a giant mess 
since many people use pyarrow to write Parquet files to be consumed by 
JVM-based systems. I think we can infer that LZ4 is not often used from the 
fact that we haven't had more bug reports about it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658437573


   @pitrou @xhochy It seems that despite adding the LZ4_FRAME format we've been 
continuing to use LZ4_RAW for Parquet files. Unfortunate that this hasn't seen 
more compatibility testing. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658431456


   No problem, I can take it from here. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658429763


   @patrickpai do you anticipate to complete this today? We are hoping to cut a 
release candidate tomorrow during the workday Central Europe Time so I can help 
finish this if needed



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7757: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec

2020-07-14 Thread GitBox


wesm commented on pull request #7757:
URL: https://github.com/apache/arrow/pull/7757#issuecomment-658393753


   Ah I see that you're adding Python changes.  I fixed the lint problems here 
so be sure to rebase your changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org