[jira] [Created] (PARQUET-2242) record count for row group size check configurable

2023-02-09 Thread xjlem (Jira)
xjlem created PARQUET-2242: -- Summary: record count for row group size check configurable Key: PARQUET-2242 URL: https://issues.apache.org/jira/browse/PARQUET-2242 Project: Parquet Issue Type:

[jira] [Updated] (PARQUET-2242) record count for row group size check configurable

2023-02-09 Thread xjlem (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xjlem updated PARQUET-2242: --- Description:  org.apache.parquet.hadoop.InternalParquetRecordWriter#checkBlockSizeReached {code:java}  

[jira] [Updated] (PARQUET-2242) record count for row group size check configurable

2023-02-09 Thread xjlem (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xjlem updated PARQUET-2242: --- Description:  org.apache.parquet.hadoop.InternalParquetRecordWriter#checkBlockSizeReached {code:java}  

[DISCUSS] ByteStreamSplitDecoder broken in presence of nulls

2023-02-09 Thread wish maple
This problem is shown in this issue: https://github.com/apache/arrow/issues/15173Let me talk about it briefly: * Encoder doesn't write "num_values" on Page payload for BYTE_STREAM_SPLIT, but using "num_values" as stride in BYTE_STREAM_SPLIT * When decoding, for DATA_PAGE_V2, it can now the

[GitHub] [parquet-mr] xjlem opened a new pull request, #1024: Parquet 2242:record count for row group size check configurable

2023-02-09 Thread via GitHub
xjlem opened a new pull request, #1024: URL: https://github.com/apache/parquet-mr/pull/1024 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-09 Thread via GitHub
wgtmac commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102202759 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (PARQUET-2229) ParquetRewriter supports masking and encrypting the same column

2023-02-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686872#comment-17686872 ] ASF GitHub Bot commented on PARQUET-2229: - wgtmac commented on PR #1021: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1021: PARQUET-2229: ParquetRewriter masks and encrypts the same column

2023-02-09 Thread via GitHub
wgtmac commented on PR #1021: URL: https://github.com/apache/parquet-mr/pull/1021#issuecomment-1425179900 Thanks @ggershinsky and @shangxinli ! Feel good to merge it now? Also cc @gszadovszky -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1024: Parquet 2242:record count for row group size check configurable

2023-02-09 Thread via GitHub
wgtmac commented on code in PR #1024: URL: https://github.com/apache/parquet-mr/pull/1024#discussion_r1102185588 ## parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java: ## @@ -95,6 +98,8 @@ private ParquetProperties(WriterVersion writerVersion, int

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686841#comment-17686841 ] ASF GitHub Bot commented on PARQUET-2237: - wgtmac commented on code in PR #1023: URL: