[jira] [Updated] (PARQUET-2226) Support union Bloom Filter

2023-01-12 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2226: -- Summary: Support union Bloom Filter (was: Support union Bloom Filter operation) > Support union Bloom

[jira] [Updated] (PARQUET-2226) Support union Bloom Filter

2023-01-12 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2226: -- Description: We need to collect Parquet's bloom filter of multiple files, and then synthesize a more

[jira] [Updated] (PARQUET-2226) Support union Bloom Filter

2023-01-12 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2226: -- Description: We need to collect Parquet's bloom filter of multiple files, and then synthesize a more

[jira] [Updated] (PARQUET-2226) Support union Bloom Filter

2023-01-12 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2226: -- Description: We need to collect Parquet's bloom filter of multiple files, and then synthesize a more

[jira] [Updated] (PARQUET-2226) Support union Bloom Filter

2023-01-12 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2226: -- Description: We need to collect Parquet's bloom filter of multiple files, and then synthesize a more

[jira] [Created] (PARQUET-2226) Support union Bloom Filter operation

2023-01-12 Thread Mars (Jira)
Mars created PARQUET-2226: - Summary: Support union Bloom Filter operation Key: PARQUET-2226 URL: https://issues.apache.org/jira/browse/PARQUET-2226 Project: Parquet Issue Type: Improvement

[jira] [Updated] (PARQUET-2226) Support union Bloom Filter operation

2023-01-12 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2226: -- Description: We need to collect Parquet's bloom filter of multiple files, and then synthesize a more

[jira] [Updated] (PARQUET-2226) Support merge Bloom Filter

2023-01-14 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2226: -- Summary: Support merge Bloom Filter (was: Support union Bloom Filter) > Support merge Bloom Filter >

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-08 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697987#comment-17697987 ] Mars commented on PARQUET-2254: --- [~wgtmac] [~gszadovszky] 1) This Jira is used to track the building of

[jira] [Assigned] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-03-08 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars reassigned PARQUET-2237: - Assignee: Mars > Improve performance when filters in RowGroupFilter can match exactly >

[jira] [Created] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-06 Thread Mars (Jira)
Mars created PARQUET-2254: - Summary: Build a BloomFilter with a more precise size Key: PARQUET-2254 URL: https://issues.apache.org/jira/browse/PARQUET-2254 Project: Parquet Issue Type: Improvement

[jira] [Updated] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-06 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2254: -- Description: Now the usage is to specify the size, and then build BloomFilter. In general scenarios, it is

[jira] [Updated] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-06 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2254: -- Description: Now the usage is to specify the size, and then build BloomFilter. In general scenarios, it is

[jira] [Updated] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2254: -- Description: h3. Why are the changes needed? Now the usage of bloom filter is to specify the NDV(number of

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size should't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: If `parquet.bloom.filter.max.bytes` configuration is not a power of 2 value, the size of the

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size should't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: If `parquet.bloom.filter.max.bytes` configuration is not a power of 2 value, the size of the

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a power of 2 value,

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a power of 2 value,

[jira] [Assigned] (PARQUET-2260) Bloom filter bytes size should't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars reassigned PARQUET-2260: - Assignee: Mars > Bloom filter bytes size should't be larger than maxBytes size in the >

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size should't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Summary: Bloom filter bytes size should't be larger than maxBytes size in the configuration (was: Bloom

[jira] [Created] (PARQUET-2260) Bloom filter bytes size should't be larger than `parquet.bloom.filter.max.bytes` in the configuration

2023-03-19 Thread Mars (Jira)
Mars created PARQUET-2260: - Summary: Bloom filter bytes size should't be larger than `parquet.bloom.filter.max.bytes` in the configuration Key: PARQUET-2260 URL: https://issues.apache.org/jira/browse/PARQUET-2260

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-03-19 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Summary: Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration (was: Bloom

[jira] [Updated] (PARQUET-2251) Avoid generating Bloomfilter when all pages of one column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2251: -- Description: In parquet pageV1, even all pages of one column are encoded by dictionary (was: In parquet

[jira] [Updated] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2251: -- Summary: Avoid generating Bloomfilter when all pages of a column are encoded by dictionary (was: Avoid

[jira] [Updated] (PARQUET-2251) Avoid generating Bloomfilter when all pages of one column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2251: -- Description: In parquet pageV1, even all pages of a column are encoded by dictionary, it will still

[jira] [Created] (PARQUET-2251) Avoid generating Bloomfilter when all pages of one column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
Mars created PARQUET-2251: - Summary: Avoid generating Bloomfilter when all pages of one column are encoded by dictionary Key: PARQUET-2251 URL: https://issues.apache.org/jira/browse/PARQUET-2251 Project:

[jira] [Updated] (PARQUET-2251) Avoid generating Bloomfilter when all pages of one column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2251: -- Description: In parquet pageV1,  > Avoid generating Bloomfilter when all pages of one column are encoded by

[jira] [Resolved] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-03-01 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars resolved PARQUET-2251. --- Resolution: Fixed > Avoid generating Bloomfilter when all pages of a column are encoded by > dictionary >

[jira] [Updated] (PARQUET-2260) Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration

2023-04-03 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2260: -- Description: Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a power of 2 value,

[jira] [Created] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-04 Thread Mars (Jira)
Mars created PARQUET-2237: - Summary: Improve performance when filters in RowGroupFilter can match exactly Key: PARQUET-2237 URL: https://issues.apache.org/jira/browse/PARQUET-2237 Project: Parquet

[jira] [Updated] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-04 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2237: -- Description: Bloomfilter needs to load from filesystem, it may costs time and space. If we can    When the

[jira] [Updated] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-04 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2237: -- Description: Bloomfilter needs to load from filesystem, it may costs time and space. If we can  exactly

[jira] [Updated] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-05 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2237: -- Description: Bloomfilter needs to load from filesystem, it may costs time and memory. If we can exactly

[jira] [Updated] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-05 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2237: -- Description: If we can accurately judge by the minMax status, we don’t need to load the dictionary from

[jira] [Updated] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-05-11 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2254: -- Description: *Why are the changes needed?* Now the usage of bloom filter is to specify the NDV(number of