Gidon Gershinsky created PARQUET-2297:
-----------------------------------------

             Summary: Encrypted files should not be checked for delta encoding 
problem
                 Key: PARQUET-2297
                 URL: https://issues.apache.org/jira/browse/PARQUET-2297
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
    Affects Versions: 1.13.0
            Reporter: Gidon Gershinsky
            Assignee: Gidon Gershinsky
             Fix For: 1.14.0, 1.13.1


Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) was 
fixed in writers since parquet-mr-1.8. This fix also added a 
`checkDeltaByteArrayProblem` method in readers, that runs over all columns and 
checks for this problem in older files. 

This now triggers an unrelated exception when reading encrypted files, in the 
following situation: trying to read an unencrypted column, without having keys 
for encrypted columns (see https://issues.apache.org/jira/browse/PARQUET-2193). 
This happens in Spark, with nested columns (files with regular columns are ok).

Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
encrypted files - because these files can be written only with parquet-mr-1.12 
and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to