[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702450#comment-17702450 ] ASF GitHub Bot commented on PARQUET-: - wgtmac commented on PR #193: URL:

[GitHub] [parquet-format] wgtmac commented on pull request #193: PARQUET-2222: RLE encoding spec incorrect for v2 data pages

2023-03-19 Thread via GitHub
wgtmac commented on PR #193: URL: https://github.com/apache/parquet-format/pull/193#issuecomment-1475639855 > Also, please someone with better knowledge of parquet-mr comment on [#193 (comment)](https://github.com/apache/parquet-format/pull/193#issuecomment-1474171946). I agree with

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702294#comment-17702294 ] ASF GitHub Bot commented on PARQUET-2254: - yabola commented on code in PR #1042: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1042: PARQUET-2254 Support building dynamic bloom filter that adapts to the data

2023-03-19 Thread via GitHub
yabola commented on code in PR #1042: URL: https://github.com/apache/parquet-mr/pull/1042#discussion_r1141357582 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/DynamicBlockBloomFilter.java: ## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache

[jira] [Updated] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2254: -- Description: h3. Why are the changes needed? Now the usage of bloom filter is to specify the NDV(number of

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702293#comment-17702293 ] ASF GitHub Bot commented on PARQUET-2254: - yabola commented on code in PR #1042: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1042: PARQUET-2254 Support building dynamic bloom filter that adapts to the data

2023-03-19 Thread via GitHub
yabola commented on code in PR #1042: URL: https://github.com/apache/parquet-mr/pull/1042#discussion_r1141357582 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/DynamicBlockBloomFilter.java: ## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702292#comment-17702292 ] ASF GitHub Bot commented on PARQUET-2254: - yabola commented on code in PR #1042: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1042: PARQUET-2254 Support building dynamic bloom filter that adapts to the data

2023-03-19 Thread via GitHub
yabola commented on code in PR #1042: URL: https://github.com/apache/parquet-mr/pull/1042#discussion_r1141357582 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/DynamicBlockBloomFilter.java: ## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702291#comment-17702291 ] ASF GitHub Bot commented on PARQUET-2254: - yabola commented on code in PR #1042: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1042: PARQUET-2254 Support building dynamic bloom filter that adapts to the data

2023-03-19 Thread via GitHub
yabola commented on code in PR #1042: URL: https://github.com/apache/parquet-mr/pull/1042#discussion_r1141356477 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/DynamicBlockBloomFilter.java: ## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702289#comment-17702289 ] ASF GitHub Bot commented on PARQUET-2254: - yabola commented on code in PR #1042: URL:

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702287#comment-17702287 ] ASF GitHub Bot commented on PARQUET-2254: - yabola commented on PR #1042: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1042: PARQUET-2254 Support building dynamic bloom filter that adapts to the data

2023-03-19 Thread via GitHub
yabola commented on code in PR #1042: URL: https://github.com/apache/parquet-mr/pull/1042#discussion_r1141355634 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/DynamicBlockBloomFilter.java: ## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1042: PARQUET-2254 Support building dynamic bloom filter that adapts to the data

2023-03-19 Thread via GitHub
yabola commented on code in PR #1042: URL: https://github.com/apache/parquet-mr/pull/1042#discussion_r1141355634 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/DynamicBlockBloomFilter.java: ## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache

[GitHub] [parquet-mr] yabola commented on pull request #1042: PARQUET-2254 Support building dynamic bloom filter that adapts to the data

2023-03-19 Thread via GitHub
yabola commented on PR #1042: URL: https://github.com/apache/parquet-mr/pull/1042#issuecomment-1475247698

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702288#comment-17702288 ] ASF GitHub Bot commented on PARQUET-2254: - yabola commented on code in PR #1042: URL:

[jira] [Commented] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702283#comment-17702283 ] ASF GitHub Bot commented on PARQUET-2260: - yabola commented on PR #1043: URL:

[GitHub] [parquet-mr] yabola commented on pull request #1043: PARQUET-2260 Bloom filter size shouldn't be larger than maxBytes in the configuration

2023-03-19 Thread via GitHub
yabola commented on PR #1043: URL: https://github.com/apache/parquet-mr/pull/1043#issuecomment-1475243273 @wgtmac @gszadovszky If you have time, please take a look, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[jira] [Commented] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702282#comment-17702282 ] ASF GitHub Bot commented on PARQUET-2260: - yabola commented on PR #1043: URL:

[GitHub] [parquet-mr] yabola commented on pull request #1043: PARQUET-2260 Bloom filter size shouldn't be larger than maxBytes in the configuration

2023-03-19 Thread via GitHub
yabola commented on PR #1043: URL: https://github.com/apache/parquet-mr/pull/1043#issuecomment-1475243085 @wgtmac @ggershinsky If you have time, please take a look, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a power of 2 value,

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a power of 2 value,

[GitHub] [parquet-mr] yabola opened a new pull request, #1043: Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread via GitHub
yabola opened a new pull request, #1043: URL: https://github.com/apache/parquet-mr/pull/1043 If `parquet.bloom.filter.max.bytes` configuration is not a power of 2 value, the size of the bloom filter generated will exceed this value. For example, now if set

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Summary: Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration (was: Bloom

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size should't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: If `parquet.bloom.filter.max.bytes` configuration is not a power of 2 value, the size of the

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size should't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: If `parquet.bloom.filter.max.bytes` configuration is not a power of 2 value, the size of the

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size should't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Summary: Bloom filter bytes size should't be larger than maxBytes size in the configuration (was: Bloom

[jira] [Assigned] (PARQUET-2260) Bloom filter bytes size should't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars reassigned PARQUET-2260: - Assignee: Mars > Bloom filter bytes size should't be larger than maxBytes size in the >

[jira] [Created] (PARQUET-2260) Bloom filter bytes size should't be larger than `parquet.bloom.filter.max.bytes` in the configuration

2023-03-19 Thread Mars (Jira)
Mars created PARQUET-2260: - Summary: Bloom filter bytes size should't be larger than `parquet.bloom.filter.max.bytes` in the configuration Key: PARQUET-2260 URL: https://issues.apache.org/jira/browse/PARQUET-2260