pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-696630511
Need to add a test with the legacy file in
https://github.com/apache/arrow-testing/pull/47
This is an automated
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-696630511
Need to add a test with the legacy file in
https://github.com/apache/arrow-testing/pull/47
This is an automated
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-694320305
I'll be able to do that.
This is an automated message from the Apache Git Service.
To respond to the message,
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-674893008
@patrickpai Do you have some time to make the desired changes here?
This is an automated message from the Apache
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660342064
Well, this change is certainly not going to land in 1.0, so I think you can
add the write path here if you want.
Also, in addition to the Hadoop-encoded LZ4 file, it would
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660293362
I don't think you need to do it in the constructor, you can simply do it
when the decompression is called.
This
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660275706
> There would be a performance cost when attempting to read data pages that
were written with incompatible lz4 codec
Why can't you implement the heuristic I outlined above?
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660272945
Hmm, I think the guess is extremely likely to be correct. There's a tiny
chance that bytes 4-7 for a non-Hadoop-compressed file would be equal to the
compressed buffer size - 8.
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660271155
We're not the only ones producing Parquet files.
This is an automated message from the Apache Git Service.
To
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660265813
Well, you can read the compressed size in bytes 4-7 and see if that
corresponds to the actual buffer size you got. If by chance it corresponds but
it is not actually
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660254916
> Should we make every Codec know its corresponding compression type enum?
> Another approach would be to define another codec, LZ4_HADOOP, [...]
Both approaches sounds ok
pitrou commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-659994574
We certainly don't want to do this on the Arrow side (the codecs may be used
for something else than Parquet), rather on the Parquet side.
12 matches
Mail list logo