[jira] [Updated] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2022-06-16 Thread Yujiang Zhong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yujiang Zhong updated PARQUET-2160: --- Description: The decompressed stream in HeapBytesDecompressor$decompress now relies on the

[jira] [Updated] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2022-06-16 Thread Yujiang Zhong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yujiang Zhong updated PARQUET-2160: --- Description: The decompressed stream in HeapBytesDecompressor$decompress now relies on the

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2022-06-16 Thread Timothy Miller (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555142#comment-17555142 ] Timothy Miller commented on PARQUET-2159: - If this is already being generated at runtime, then

[jira] [Comment Edited] (PARQUET-2159) Parquet bit-packing de/encode optimization

2022-06-16 Thread Fang-Xie (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555116#comment-17555116 ] Fang-Xie edited comment on PARQUET-2159 at 6/16/22 3:14 PM: We implemented

[GitHub] [parquet-mr] huaxingao commented on pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-16 Thread GitBox
huaxingao commented on PR #975: URL: https://github.com/apache/parquet-mr/pull/975#issuecomment-1157762209 > it should be good enough to also check the lower limit, eg exist > totalCount * (testFpp[i] * 0.9) , or exist > totalCount * (testFpp[i] * 0.5) , or even exist > 0. What do you

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555118#comment-17555118 ] ASF GitHub Bot commented on PARQUET-2157: - huaxingao commented on PR #975: URL:

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2022-06-16 Thread Fang-Xie (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555116#comment-17555116 ] Fang-Xie commented on PARQUET-2159: --- We implemented Parquet bit packing en/decode using JDK Vector

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555115#comment-17555115 ] ASF GitHub Bot commented on PARQUET-2157: - huaxingao commented on code in PR #975: URL:

[GitHub] [parquet-mr] huaxingao commented on a diff in pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-16 Thread GitBox
huaxingao commented on code in PR #975: URL: https://github.com/apache/parquet-mr/pull/975#discussion_r899177750 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java: ## @@ -282,6 +286,63 @@ public void testParquetFileWithBloomFilter() throws

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2022-06-16 Thread Timothy Miller (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555099#comment-17555099 ] Timothy Miller commented on PARQUET-2159: - I frequently wish Java had a preprocessor like C++

[jira] [Updated] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2022-06-16 Thread Yujiang Zhong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yujiang Zhong updated PARQUET-2160: --- Description: The decompressed stream in HeapBytesDecompressor$decompress now relies on the

[jira] [Commented] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2022-06-16 Thread Yujiang Zhong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555082#comment-17555082 ] Yujiang Zhong commented on PARQUET-2160: [~shangxinli] [~dongjoon] Can you please take a look

[jira] [Created] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2022-06-16 Thread Yujiang Zhong (Jira)
Yujiang Zhong created PARQUET-2160: -- Summary: Close decompression stream to free off-heap memory in time Key: PARQUET-2160 URL: https://issues.apache.org/jira/browse/PARQUET-2160 Project: Parquet

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2022-06-16 Thread Fang-Xie (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555081#comment-17555081 ] Fang-Xie commented on PARQUET-2159: --- Thanks [~theosib-amazon], these improvements depend on Vector

[jira] [Updated] (PARQUET-2051) AvroWriteSupport does not pass Configuration to AvroSchemaConverter on Creation

2022-06-16 Thread Andreas Hailu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Hailu updated PARQUET-2051: --- Fix Version/s: 1.12.3 > AvroWriteSupport does not pass Configuration to

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555041#comment-17555041 ] ASF GitHub Bot commented on PARQUET-2157: - ggershinsky commented on PR #975: URL:

[GitHub] [parquet-mr] ggershinsky commented on pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-16 Thread GitBox
ggershinsky commented on PR #975: URL: https://github.com/apache/parquet-mr/pull/975#issuecomment-1157577513 > The test takes about 2300 milli seconds on my laptop. Ok, this is reasonable. If this time is sufficient for reliably testing the upper limit of FPPs, it should be good

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554932#comment-17554932 ] ASF GitHub Bot commented on PARQUET-2157: - chenjunjiedada commented on code in PR #975: URL:

[GitHub] [parquet-mr] chenjunjiedada commented on a diff in pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-16 Thread GitBox
chenjunjiedada commented on code in PR #975: URL: https://github.com/apache/parquet-mr/pull/975#discussion_r898756998 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java: ## @@ -282,6 +286,63 @@ public void testParquetFileWithBloomFilter() throws