[GitHub] [parquet-mr] xjlem commented on a diff in pull request #1024: PARQUET-2242:record count for row group size check configurable

2023-02-13 Thread via GitHub
xjlem commented on code in PR #1024: URL: https://github.com/apache/parquet-mr/pull/1024#discussion_r1104147184 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java: ## @@ -142,6 +142,8 @@ public static enum JobSummaryLevel { public static final

[jira] [Commented] (PARQUET-2242) record count for row group size check configurable

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687805#comment-17687805 ] ASF GitHub Bot commented on PARQUET-2242: - xjlem commented on code in PR #1024: URL:

[GitHub] [parquet-mr] xjlem commented on a diff in pull request #1024: PARQUET-2242:record count for row group size check configurable

2023-02-13 Thread via GitHub
xjlem commented on code in PR #1024: URL: https://github.com/apache/parquet-mr/pull/1024#discussion_r1104154913 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordWriter.java: ## @@ -147,12 +152,12 @@ private void checkBlockSizeReached() throws

[jira] [Commented] (PARQUET-2242) record count for row group size check configurable

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687809#comment-17687809 ] ASF GitHub Bot commented on PARQUET-2242: - xjlem commented on code in PR #1024: URL:

[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2023-02-13 Thread via GitHub
pitrou commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1427745167 > It might have missed it but I didn't see Julien's reply on the dev mailing list. This seems reasonable though. For full disclosure, it was a discussion involving the Parquet

[jira] [Commented] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687886#comment-17687886 ] ASF GitHub Bot commented on PARQUET-758: pitrou commented on PR #184: URL:

[GitHub] [parquet-mr] wgtmac opened a new pull request, #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-13 Thread via GitHub
wgtmac opened a new pull request, #1026: URL: https://github.com/apache/parquet-mr/pull/1026 ### Jira https://issues.apache.org/jira/browse/PARQUET-2228 ### Tests - Refactor and add various cases to `ParquetRewriterTest` for merging files. ### Commits -

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687977#comment-17687977 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac opened a new pull request, #1026: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-13 Thread via GitHub
wgtmac commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1104610792 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687985#comment-17687985 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on code in PR #1026: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1024: PARQUET-2242:record count for row group size check configurable

2023-02-13 Thread via GitHub
wgtmac commented on code in PR #1024: URL: https://github.com/apache/parquet-mr/pull/1024#discussion_r1104614696 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java: ## @@ -142,6 +142,8 @@ public static enum JobSummaryLevel { public static

[jira] [Commented] (PARQUET-2242) record count for row group size check configurable

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687987#comment-17687987 ] ASF GitHub Bot commented on PARQUET-2242: - wgtmac commented on code in PR #1024: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-13 Thread via GitHub
wgtmac commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1428175723 @ggershinsky @shangxinli PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688001#comment-17688001 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on PR #1026: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1025: PARQUET-2241: Fix ByteStreamSplitValuesReader with nulls

2023-02-13 Thread via GitHub
shangxinli commented on code in PR #1025: URL: https://github.com/apache/parquet-mr/pull/1025#discussion_r1104758035 ## parquet-column/src/main/java/org/apache/parquet/column/values/bytestreamsplit/ByteStreamSplitValuesReader.java: ## @@ -58,15 +58,19 @@ protected void

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688027#comment-17688027 ] ASF GitHub Bot commented on PARQUET-2241: - shangxinli commented on code in PR #1025: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-13 Thread via GitHub
shangxinli commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1104767997 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688029#comment-17688029 ] ASF GitHub Bot commented on PARQUET-2228: - shangxinli commented on code in PR #1026: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-13 Thread via GitHub
shangxinli commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1104769279 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688030#comment-17688030 ] ASF GitHub Bot commented on PARQUET-2228: - shangxinli commented on code in PR #1026: URL:

[GitHub] [parquet-format] julienledem commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2023-02-13 Thread via GitHub
julienledem commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1428363226 @emkornfield apologies for this, I realize I replied to a thread on the private list. @pitrou emailed private@ to get more eyes on this. Which worked. But yes, this discussion

[jira] [Commented] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688043#comment-17688043 ] ASF GitHub Bot commented on PARQUET-758: julienledem commented on PR #184: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-13 Thread via GitHub
wgtmac commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1105191438 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688224#comment-17688224 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on code in PR #1026: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-13 Thread via GitHub
wgtmac commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1105193381 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688227#comment-17688227 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on code in PR #1026: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1025: PARQUET-2241: Fix ByteStreamSplitValuesReader with nulls

2023-02-13 Thread via GitHub
wgtmac commented on code in PR #1025: URL: https://github.com/apache/parquet-mr/pull/1025#discussion_r1105198239 ## parquet-column/src/main/java/org/apache/parquet/column/values/bytestreamsplit/ByteStreamSplitValuesReader.java: ## @@ -58,15 +58,19 @@ protected void

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688231#comment-17688231 ] ASF GitHub Bot commented on PARQUET-2241: - wgtmac commented on code in PR #1025: URL:

[GitHub] [parquet-mr] xjlem closed pull request #1024: PARQUET-2242:record count for row group size check configurable

2023-02-13 Thread via GitHub
xjlem closed pull request #1024: PARQUET-2242:record count for row group size check configurable URL: https://github.com/apache/parquet-mr/pull/1024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [parquet-mr] xjlem commented on a diff in pull request #1024: PARQUET-2242:record count for row group size check configurable

2023-02-13 Thread via GitHub
xjlem commented on code in PR #1024: URL: https://github.com/apache/parquet-mr/pull/1024#discussion_r1105253278 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java: ## @@ -142,6 +142,8 @@ public static enum JobSummaryLevel { public static final

[jira] [Commented] (PARQUET-2242) record count for row group size check configurable

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688257#comment-17688257 ] ASF GitHub Bot commented on PARQUET-2242: - xjlem commented on code in PR #1024: URL: