ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-916116455
> it seems there is also a bug in parquet-cpp which causes incorrect file
offset to be written, see https://issues.apache.org/jira/browse/SPARK-36696, so
we'll want to
ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-916104298
Yep, looks good. The first row group always starts at offset 4.
@loudongfeng Maybe the hardcoded `4` should be replaced with eg
`ParquetFileWriter.MAGIC.length`? Or even
ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-912932050
Thanks @loudongfeng , looks good. I'll run the last round of checks with a
number of encryption modes early next week.
--
This is an automated message from the Apache
ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-909113979
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908443796
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-909119450
> FYI,Maybe we can make use of this information :
> RowGroup[n].file_offset = RowGroup[n-1].file_offset +
RowGroup[n-1].total_compressed_size
> total_compressed_size
ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-909113979
@gszadovszky No problem at all, thank you for helping with this!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908463834
Yep, but the current fix perpetuates the situation where some readers can't
process encrypted files, even if they have keys for all projected columns;
doesn't look like an
ggershinsky commented on pull request #925:
URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908443796
Sure. This won't work if the first column is encrypted and the reader
doesn't have its key. Can the "write" part be fixed instead, so the RowGroup
offset is set correctly?