[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1108046928 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689553#comment-17689553 ] ASF GitHub Bot commented on PARQUET-2237: - wgtmac commented on code in PR #1023: URL:

[jira] [Commented] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689514#comment-17689514 ] ASF GitHub Bot commented on PARQUET-2243: - wgtmac commented on code in PR #1027: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1027: PARQUET-2243: Support zstd-jni in DirectCodecFactory

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1027: URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1108004026 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java: ## @@ -66,7 +66,8 @@ import org.slf4j.Logger; import

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432572977 I haven't encountered any troubles caused by this situation in practice. I found this while looking at the code, when evaluating `notIn`, dictionary filter returns

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689557#comment-17689557 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang commented on PR #1028: URL:

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432596607 Didn't think about comparisons with non-null values before submitting this PR. I don't know if there is a downstream that relies on Parquet judge `value <> null` as TRUE instead

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689522#comment-17689522 ] ASF GitHub Bot commented on PARQUET-2244: - wgtmac commented on PR #1028: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1029: PARQUET-2245: Improve dictionary filter evaluating notEq

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1029: URL: https://github.com/apache/parquet-mr/pull/1029#discussion_r1108041382 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java: ## @@ -187,10 +196,7 @@ public > Boolean visit(NotEq notEq) {

[jira] [Commented] (PARQUET-2245) Improve dictionary filter evaluating notEq

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689543#comment-17689543 ] ASF GitHub Bot commented on PARQUET-2245: - wgtmac commented on code in PR #1029: URL:

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689546#comment-17689546 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang commented on PR #1028: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
wgtmac commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432527608 > I did a quick test using Spark > > ``` > Seq("A", "A", null).toDF("column").repartition(1).write.mode("overwrite").parquet("t") >

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689552#comment-17689552 ] ASF GitHub Bot commented on PARQUET-2237: - wgtmac commented on code in PR #1023: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102205460 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688999#comment-17688999 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang opened a new pull request, #1028: URL:

[GitHub] [parquet-mr] gszadovszky opened a new pull request, #1027: PARQUET-2243: Support zstd-jni in DirectCodecFactory

2023-02-15 Thread via GitHub
gszadovszky opened a new pull request, #1027: URL: https://github.com/apache/parquet-mr/pull/1027 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references

[jira] [Commented] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688932#comment-17688932 ] ASF GitHub Bot commented on PARQUET-2243: - gszadovszky opened a new pull request, #1027: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
gszadovszky commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431102177 @wgtmac, by supporting multiple files to rewrite them into one we will end up with the same number of row-groups, right? Therefore, this tool is not ment to be used to solve the

[GitHub] [parquet-mr] zhongyujiang opened a new pull request, #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang opened a new pull request, #1028: URL: https://github.com/apache/parquet-mr/pull/1028 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references

[jira] [Created] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Yujiang Zhong (Jira)
Yujiang Zhong created PARQUET-2244: -- Summary: Dictionary filter may skip row-groups incorrectly when evaluating notIn Key: PARQUET-2244 URL: https://issues.apache.org/jira/browse/PARQUET-2244

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431067528 @huaxingao @gszadovszky Can you help review this? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689002#comment-17689002 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang commented on PR #1028: URL:

[jira] [Updated] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Yujiang Zhong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yujiang Zhong updated PARQUET-2244: --- Description: Dictionary filter may skip row-groups incorrectly when evaluating `notIn` on

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
gszadovszky commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1106931407 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689007#comment-17689007 ] ASF GitHub Bot commented on PARQUET-2228: - gszadovszky commented on code in PR #1026: URL:

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689011#comment-17689011 ] ASF GitHub Bot commented on PARQUET-2228: - gszadovszky commented on PR #1026: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1027: PARQUET-2243: Support zstd-jni in DirectCodecFactory

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1027: URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1107006966 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java: ## @@ -66,7 +66,8 @@ import org.slf4j.Logger; import

[jira] [Commented] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689114#comment-17689114 ] ASF GitHub Bot commented on PARQUET-2246: - zhongyujiang opened a new pull request, #1030: URL:

[jira] [Commented] (PARQUET-1950) Define core features / compliance level

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689124#comment-17689124 ] ASF GitHub Bot commented on PARQUET-1950: - gszadovszky commented on PR #164: URL:

[jira] [Assigned] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2244: - Assignee: Yujiang Zhong > Dictionary filter may skip row-groups incorrectly

[jira] [Resolved] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2244. --- Resolution: Fixed > Dictionary filter may skip row-groups incorrectly when

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431180145 @gszadovszky Thanks for reviewing and the quick merge! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689045#comment-17689045 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang commented on PR #1028: URL:

[jira] [Created] (PARQUET-2245) Improve dictionary filter evaluating notEq

2023-02-15 Thread Yujiang Zhong (Jira)
Yujiang Zhong created PARQUET-2245: -- Summary: Improve dictionary filter evaluating notEq Key: PARQUET-2245 URL: https://issues.apache.org/jira/browse/PARQUET-2245 Project: Parquet Issue

[GitHub] [parquet-mr] zhongyujiang opened a new pull request, #1030: PARQUET-2246: Add short circuit logic to column index filter

2023-02-15 Thread via GitHub
zhongyujiang opened a new pull request, #1030: URL: https://github.com/apache/parquet-mr/pull/1030 Jira: [PARQUET-2246](https://issues.apache.org/jira/browse/PARQUET-2246) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689034#comment-17689034 ] ASF GitHub Bot commented on PARQUET-2244: - gszadovszky commented on PR #1028: URL:

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689035#comment-17689035 ] ASF GitHub Bot commented on PARQUET-2244: - gszadovszky merged PR #1028: URL:

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689046#comment-17689046 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on PR #1026: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
wgtmac commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431182101 > @wgtmac, by supporting multiple files to rewrite them into one we will end up with the same number of row-groups, right? Therefore, this tool is not ment to be used to solve the

[GitHub] [parquet-mr] wgtmac commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
wgtmac commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431185993 > > @wgtmac, by supporting multiple files to rewrite them into one we will end up with the same number of row-groups, right? Therefore, this tool is not ment to be used to solve the

[jira] [Commented] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689103#comment-17689103 ] ASF GitHub Bot commented on PARQUET-2243: - gszadovszky commented on code in PR #1027: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
gszadovszky commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431359196 > You're right. We might add an option to force rewriting the input files record by record so row groups are regenerated by the writer. Does that sound good? @gszadovszky

[GitHub] [parquet-mr] gszadovszky merged pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
gszadovszky merged PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
gszadovszky commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431153336 @shangxinli, it might require a backport and releases on the branches `In` and `NotIn` were released. -- This is an automated message from the Apache Git Service. To respond to

[jira] [Commented] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689062#comment-17689062 ] ASF GitHub Bot commented on PARQUET-2243: - wgtmac commented on code in PR #1027: URL:

[GitHub] [parquet-mr] zhongyujiang opened a new pull request, #1029: PARQUET-2245: Improve dictionary filter evaluating notEq

2023-02-15 Thread via GitHub
zhongyujiang opened a new pull request, #1029: URL: https://github.com/apache/parquet-mr/pull/1029 JIRA: [PARQUET-2245](https://issues.apache.org/jira/browse/PARQUET-2245) This is a minor improvement for evaluating `notEq`. When evaluating `notEq`, if the column may contain nulls and

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689115#comment-17689115 ] ASF GitHub Bot commented on PARQUET-2159: - gszadovszky commented on code in PR #1011: URL:

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-15 Thread via GitHub
gszadovszky commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1107005376 ## parquet-generator/src/main/java/org/apache/parquet/encoding/vectorbitpacking/BitPackingGenerator512Vector.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689118#comment-17689118 ] ASF GitHub Bot commented on PARQUET-2228: - gszadovszky commented on PR #1026: URL:

[GitHub] [parquet-format] gszadovszky commented on pull request #164: PARQUET-1950: Define core features

2023-02-15 Thread via GitHub
gszadovszky commented on PR #164: URL: https://github.com/apache/parquet-format/pull/164#issuecomment-1431376526 I don't know other implementation either. Since `parquet-format` is managed by this community I would expect the "implementors" to listen to the dev mailing list at least. I

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689050#comment-17689050 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on PR #1026: URL:

[jira] [Commented] (PARQUET-2245) Improve dictionary filter evaluating notEq

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689088#comment-17689088 ] ASF GitHub Bot commented on PARQUET-2245: - zhongyujiang opened a new pull request, #1029: URL:

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1027: PARQUET-2243: Support zstd-jni in DirectCodecFactory

2023-02-15 Thread via GitHub
gszadovszky commented on code in PR #1027: URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1107084792 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java: ## @@ -66,7 +66,8 @@ import org.slf4j.Logger; import

[jira] [Created] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-15 Thread Yujiang Zhong (Jira)
Yujiang Zhong created PARQUET-2246: -- Summary: Add short circuit logic to column index filter Key: PARQUET-2246 URL: https://issues.apache.org/jira/browse/PARQUET-2246 Project: Parquet Issue

[GitHub] [parquet-mr] wgtmac commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
wgtmac commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431423625 > > You're right. We might add an option to force rewriting the input files record by record so row groups are regenerated by the writer. Does that sound good? @gszadovszky > >

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689138#comment-17689138 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on PR #1026: URL:

[GitHub] [parquet-mr] huaxingao commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
huaxingao commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432481988 I did a quick test using Spark ``` Seq("A", "A", null).toDF("column").repartition(1).write.mode("overwrite").parquet("t")

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689493#comment-17689493 ] ASF GitHub Bot commented on PARQUET-2244: - huaxingao commented on PR #1028: URL: