[
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2226:
--
Summary: Support union Bloom Filter (was: Support union Bloom Filter
operation)
> Support union Bloom
[
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2226:
--
Description:
We need to collect Parquet's bloom filter of multiple files, and then
synthesize a more
[
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2226:
--
Description:
We need to collect Parquet's bloom filter of multiple files, and then
synthesize a more
[
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2226:
--
Description:
We need to collect Parquet's bloom filter of multiple files, and then
synthesize a more
[
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2226:
--
Description:
We need to collect Parquet's bloom filter of multiple files, and then
synthesize a more
Mars created PARQUET-2226:
-
Summary: Support union Bloom Filter operation
Key: PARQUET-2226
URL: https://issues.apache.org/jira/browse/PARQUET-2226
Project: Parquet
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2226:
--
Description:
We need to collect Parquet's bloom filter of multiple files, and then
synthesize a more
[
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2226:
--
Summary: Support merge Bloom Filter (was: Support union Bloom Filter)
> Support merge Bloom Filter
>
[
https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697987#comment-17697987
]
Mars commented on PARQUET-2254:
---
[~wgtmac] [~gszadovszky]
1) This Jira is used to track the building of
[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars reassigned PARQUET-2237:
-
Assignee: Mars
> Improve performance when filters in RowGroupFilter can match exactly
>
Mars created PARQUET-2254:
-
Summary: Build a BloomFilter with a more precise size
Key: PARQUET-2254
URL: https://issues.apache.org/jira/browse/PARQUET-2254
Project: Parquet
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2254:
--
Description:
Now the usage is to specify the size, and then build BloomFilter. In general
scenarios, it is
[
https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2254:
--
Description:
Now the usage is to specify the size, and then build BloomFilter. In general
scenarios, it is
[
https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2254:
--
Description:
h3. Why are the changes needed?
Now the usage of bloom filter is to specify the NDV(number of
[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2260:
--
Description:
If `parquet.bloom.filter.max.bytes` configuration is not a power of 2 value,
the size of the
[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2260:
--
Description:
If `parquet.bloom.filter.max.bytes` configuration is not a power of 2 value,
the size of the
[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2260:
--
Description:
Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a
power of 2 value,
[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2260:
--
Description:
Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a
power of 2 value,
[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars reassigned PARQUET-2260:
-
Assignee: Mars
> Bloom filter bytes size should't be larger than maxBytes size in the
>
[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2260:
--
Summary: Bloom filter bytes size should't be larger than maxBytes size in
the configuration (was: Bloom
Mars created PARQUET-2260:
-
Summary: Bloom filter bytes size should't be larger than
`parquet.bloom.filter.max.bytes` in the configuration
Key: PARQUET-2260
URL: https://issues.apache.org/jira/browse/PARQUET-2260
[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2260:
--
Summary: Bloom filter bytes size shouldn't be larger than maxBytes size in
the configuration (was: Bloom
[
https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2251:
--
Description: In parquet pageV1, even all pages of one column are encoded by
dictionary (was: In parquet
[
https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2251:
--
Summary: Avoid generating Bloomfilter when all pages of a column are
encoded by dictionary (was: Avoid
[
https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2251:
--
Description:
In parquet pageV1, even all pages of a column are encoded by dictionary, it
will still
Mars created PARQUET-2251:
-
Summary: Avoid generating Bloomfilter when all pages of one column
are encoded by dictionary
Key: PARQUET-2251
URL: https://issues.apache.org/jira/browse/PARQUET-2251
Project:
[
https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2251:
--
Description: In parquet pageV1,
> Avoid generating Bloomfilter when all pages of one column are encoded by
[
https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars resolved PARQUET-2251.
---
Resolution: Fixed
> Avoid generating Bloomfilter when all pages of a column are encoded by
> dictionary
>
[
https://issues.apache.org/jira/browse/PARQUET-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2260:
--
Description:
Before this PR: If {{parquet.bloom.filter.max.bytes}} configuration is not a
power of 2 value,
Mars created PARQUET-2237:
-
Summary: Improve performance when filters in RowGroupFilter can
match exactly
Key: PARQUET-2237
URL: https://issues.apache.org/jira/browse/PARQUET-2237
Project: Parquet
[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2237:
--
Description:
Bloomfilter needs to load from filesystem, it may costs time and space. If we
can
When the
[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2237:
--
Description:
Bloomfilter needs to load from filesystem, it may costs time and space. If we
can exactly
[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2237:
--
Description:
Bloomfilter needs to load from filesystem, it may costs time and memory. If we
can exactly
[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2237:
--
Description:
If we can accurately judge by the minMax status, we don’t need to load the
dictionary from
[
https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated PARQUET-2254:
--
Description:
*Why are the changes needed?*
Now the usage of bloom filter is to specify the NDV(number of
35 matches
Mail list logo