[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323506069 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323505565 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + +

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-11 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1322319582 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320231941 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320143220 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319461836 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319461836 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319461836 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319366325 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318930925 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,73 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319135598 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,73 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319219710 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319135598 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,73 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319135598 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,73 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319130534 ## src/main/thrift/parquet.thrift: ## @@ -583,7 +659,12 @@ struct DataPageHeaderV2 { If missing it is considered compressed */ 7: optional bool

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319130288 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318930925 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,73 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318928626 ## src/main/thrift/parquet.thrift: ## @@ -529,7 +596,15 @@ struct DataPageHeader { /** Encoding used for repetition levels **/ 4: required Encoding

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318059860 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317866553 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317867372 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317866553 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-02 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1313882045 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-01 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1313652871 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-01 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1313392696 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-01 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1313391714 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-01 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1313391303 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-01 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1313329517 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +845,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + +

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-25 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1305871232 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1302257411 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1302161195 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1301975357 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1301969299 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,62 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * Tracks a histogram of repetition and

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1301968222 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,62 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * Tracks a histogram of repetition and