[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-14 Thread GitBox
wgtmac commented on code in PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#discussion_r1070233210 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/BlockSplitBloomFilter.java: ## @@ -394,4 +395,21 @@ public long hash(float value) {

[jira] [Commented] (PARQUET-2226) Support union Bloom Filter

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676864#comment-17676864 ] ASF GitHub Bot commented on PARQUET-2226: - wgtmac commented on code in PR #1020: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-14 Thread GitBox
yabola commented on code in PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#discussion_r1070267028 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/BloomFilter.java: ## @@ -176,4 +176,10 @@ public String toString() { * @return

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-14 Thread GitBox
gszadovszky commented on code in PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#discussion_r1070274495 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -0,0 +1,733 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676905#comment-17676905 ] ASF GitHub Bot commented on PARQUET-2075: - gszadovszky commented on code in PR #1014: URL:

[jira] [Updated] (PARQUET-2226) Support merge Bloom Filter

2023-01-14 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2226: -- Summary: Support merge Bloom Filter (was: Support union Bloom Filter) > Support merge Bloom Filter >

[jira] [Commented] (PARQUET-2226) Support merge Bloom Filter

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676900#comment-17676900 ] ASF GitHub Bot commented on PARQUET-2226: - gszadovszky commented on PR #1020: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-14 Thread GitBox
gszadovszky commented on PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1382736990 Thanks, @yabola for working on this and also to @wgtmac for reviewing. I do not have much experience with bloom filters so I will rely on your review. Ping me if you have a +1.

[jira] [Commented] (PARQUET-2226) Support union Bloom Filter

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676897#comment-17676897 ] ASF GitHub Bot commented on PARQUET-2226: - yabola commented on code in PR #1020: URL:

[jira] [Commented] (PARQUET-2226) Support union Bloom Filter

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676898#comment-17676898 ] ASF GitHub Bot commented on PARQUET-2226: - yabola commented on code in PR #1020: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-14 Thread GitBox
yabola commented on code in PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#discussion_r1070267080 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/BlockSplitBloomFilter.java: ## @@ -394,4 +395,21 @@ public long hash(float value) {

[GitHub] [parquet-mr] gszadovszky commented on pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-14 Thread GitBox
gszadovszky commented on PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1382737603 One more thing, @yabola. The compatibility tests fail because you have added a new method to a public interface. Even though this interface is not supposed to be implemented by

[jira] [Commented] (PARQUET-2226) Support merge Bloom Filter

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676901#comment-17676901 ] ASF GitHub Bot commented on PARQUET-2226: - gszadovszky commented on PR #1020: URL:

[jira] [Created] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-01-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2228: Summary: ParquetRewriter supports more than one input file Key: PARQUET-2228 URL: https://issues.apache.org/jira/browse/PARQUET-2228 Project: Parquet Issue Type:

[jira] [Created] (PARQUET-2229) ParquetRewriter supports masking and encrypting the same column

2023-01-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2229: Summary: ParquetRewriter supports masking and encrypting the same column Key: PARQUET-2229 URL: https://issues.apache.org/jira/browse/PARQUET-2229 Project: Parquet

[jira] [Commented] (PARQUET-2226) Support merge Bloom Filter

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676907#comment-17676907 ] ASF GitHub Bot commented on PARQUET-2226: - yabola commented on code in PR #1020: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-14 Thread GitBox
yabola commented on code in PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#discussion_r1070267028 ## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/BloomFilter.java: ## @@ -176,4 +176,10 @@ public String toString() { * @return

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676912#comment-17676912 ] ASF GitHub Bot commented on PARQUET-2075: - gszadovszky commented on PR #1014: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-14 Thread GitBox
gszadovszky commented on PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1382754916 > * I'd prefer creating a new JIRA for this refactor to be a prerequisite. Merging multiple files to a single one with customized pruning, encryption, and codec is also in my mind

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676916#comment-17676916 ] ASF GitHub Bot commented on PARQUET-2075: - wgtmac commented on PR #1014: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-14 Thread GitBox
wgtmac commented on PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1382815489 > > * I'd prefer creating a new JIRA for this refactor to be a prerequisite. Merging multiple files to a single one with customized pruning, encryption, and codec is also in my mind

[jira] [Created] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-01-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2230: Summary: Add a new rewrite command powered by ParquetRewriter Key: PARQUET-2230 URL: https://issues.apache.org/jira/browse/PARQUET-2230 Project: Parquet Issue Type:

[jira] [Commented] (PARQUET-2227) Refactor different file rewriters to use single implementation

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676927#comment-17676927 ] ASF GitHub Bot commented on PARQUET-2227: - gszadovszky commented on PR #1014: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1014: PARQUET-2227: Refactor several file rewriters to use a new unified ParquetRewriter implementation

2023-01-14 Thread GitBox
gszadovszky commented on PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1382840526 > I am afraid some implementations may drop characters after `'\n'` when displaying the string content. Let me do some investigation. I do not have a strong opinion for

[GitHub] [parquet-mr] wgtmac commented on pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-14 Thread GitBox
wgtmac commented on PR #1014: URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1382752637 > I think it is a great refactor. Thanks a lot for working on it, @wgtmac! In the other hand I've thought about PARQUET-2075 as a request for a new feature in `parquet-cli`

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676909#comment-17676909 ] ASF GitHub Bot commented on PARQUET-2075: - wgtmac commented on PR #1014: URL:

[jira] [Created] (PARQUET-2227) Refactor different file rewriters to use single implementation

2023-01-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2227: Summary: Refactor different file rewriters to use single implementation Key: PARQUET-2227 URL: https://issues.apache.org/jira/browse/PARQUET-2227 Project: Parquet

[GitHub] [parquet-mr] shangxinli commented on pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-14 Thread GitBox
shangxinli commented on PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1383006013 @chenjunjiedada Do you still have time to review this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Commented] (PARQUET-2226) Support merge Bloom Filter

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676961#comment-17676961 ] ASF GitHub Bot commented on PARQUET-2226: - shangxinli commented on PR #1020: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #1016: PARQUET-2223: Parquet Data Masking Enhancement for Column Encryption

2023-01-14 Thread GitBox
shangxinli commented on PR #1016: URL: https://github.com/apache/parquet-mr/pull/1016#issuecomment-1383006808 @ggershinsky Do you have time to have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Commented] (PARQUET-2223) Parquet Data Masking for Column Encryption

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676963#comment-17676963 ] ASF GitHub Bot commented on PARQUET-2223: - shangxinli commented on PR #1016: URL:

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2023-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676962#comment-17676962 ] ASF GitHub Bot commented on PARQUET-2219: - shangxinli merged PR #1018: URL:

[GitHub] [parquet-mr] shangxinli merged pull request #1018: PARQUET-2219: ParquetFileReader skips empty row group

2023-01-14 Thread GitBox
shangxinli merged PR #1018: URL: https://github.com/apache/parquet-mr/pull/1018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: