[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-09-09 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-916116455 > it seems there is also a bug in parquet-cpp which causes incorrect file offset to be written, see https://issues.apache.org/jira/browse/SPARK-36696, so we'll want to

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-09-09 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-916104298 Yep, looks good. The first row group always starts at offset 4. @loudongfeng Maybe the hardcoded `4` should be replaced with eg `ParquetFileWriter.MAGIC.length`? Or even

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-09-04 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-912932050 Thanks @loudongfeng , looks good. I'll run the last round of checks with a number of encryption modes early next week. -- This is an automated message from the Apache

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-09-01 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-909113979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-31 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908443796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-31 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-909119450 > FYI,Maybe we can make use of this information : > RowGroup[n].file_offset = RowGroup[n-1].file_offset + RowGroup[n-1].total_compressed_size > total_compressed_size

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-31 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-909113979 @gszadovszky No problem at all, thank you for helping with this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-30 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908463834 Yep, but the current fix perpetuates the situation where some readers can't process encrypted files, even if they have keys for all projected columns; doesn't look like an

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-30 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908443796 Sure. This won't work if the first column is encrypted and the reader doesn't have its key. Can the "write" part be fixed instead, so the RowGroup offset is set correctly?