[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739052#comment-17739052 ] ASF GitHub Bot commented on PARQUET-2249: - gszadovszky commented on PR #196: UR

[GitHub] [parquet-format] gszadovszky commented on pull request #196: PARQUET-2249: Add nan_count to handle NaNs in statistics

2023-06-30 Thread via GitHub
gszadovszky commented on PR #196: URL: https://github.com/apache/parquet-format/pull/196#issuecomment-1614549513 @mapleFU, as I've written before that's why we initiated [ColumnOrder](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L863) to make the forma

[GitHub] [parquet-format] mapleFU commented on pull request #196: PARQUET-2249: Add nan_count to handle NaNs in statistics

2023-06-30 Thread via GitHub
mapleFU commented on PR #196: URL: https://github.com/apache/parquet-format/pull/196#issuecomment-1614520051 > Currently the arrow-rs implementation uses the totalOrder predicate as defined by the IEEE 754 (2008 revision) floating point standard to order floats, this can be very efficiently

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739036#comment-17739036 ] ASF GitHub Bot commented on PARQUET-2249: - mapleFU commented on PR #196: URL: h

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739025#comment-17739025 ] ASF GitHub Bot commented on PARQUET-2249: - tustvold commented on PR #196: URL:

[GitHub] [parquet-format] tustvold commented on pull request #196: PARQUET-2249: Add nan_count to handle NaNs in statistics

2023-06-30 Thread via GitHub
tustvold commented on PR #196: URL: https://github.com/apache/parquet-format/pull/196#issuecomment-1614476748 > I wonder for PageIndex pruning in Rust implementions Currently the arrow-rs implementation uses the totalOrder predicate as defined by the IEEE 754 (2008 revision) floating

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739008#comment-17739008 ] ASF GitHub Bot commented on PARQUET-2249: - wgtmac commented on code in PR #196:

[GitHub] [parquet-format] wgtmac commented on a diff in pull request #196: PARQUET-2249: Add nan_count to handle NaNs in statistics

2023-06-30 Thread via GitHub
wgtmac commented on code in PR #196: URL: https://github.com/apache/parquet-format/pull/196#discussion_r1247688333 ## src/main/thrift/parquet.thrift: ## @@ -966,6 +985,23 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optiona

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738996#comment-17738996 ] ASF GitHub Bot commented on PARQUET-2249: - mapleFU commented on PR #196: URL: h

[GitHub] [parquet-format] mapleFU commented on pull request #196: PARQUET-2249: Add nan_count to handle NaNs in statistics

2023-06-30 Thread via GitHub
mapleFU commented on PR #196: URL: https://github.com/apache/parquet-format/pull/196#issuecomment-1614407651 https://github.com/apache/parquet-format/pull/196#discussion_r1237381221 @alamb @tustvold Hi, for PageIndex pruning in Rust implementions, would it matter for adding `[-inf, +inf]

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738993#comment-17738993 ] ASF GitHub Bot commented on PARQUET-2249: - mapleFU commented on code in PR #196

[GitHub] [parquet-format] mapleFU commented on a diff in pull request #196: PARQUET-2249: Add nan_count to handle NaNs in statistics

2023-06-30 Thread via GitHub
mapleFU commented on code in PR #196: URL: https://github.com/apache/parquet-format/pull/196#discussion_r1247671677 ## src/main/thrift/parquet.thrift: ## @@ -966,6 +985,23 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[jira] [Updated] (PARQUET-2318) Implement a tool to list page headers

2023-06-30 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2318: - Fix Version/s: 1.14.0 > Implement a tool to list page headers > - >

[jira] [Commented] (PARQUET-2318) Implement a tool to list page headers

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738956#comment-17738956 ] ASF GitHub Bot commented on PARQUET-2318: - gszadovszky merged PR #1117: URL: ht

[GitHub] [parquet-mr] gszadovszky merged pull request #1117: PARQUET-2318: Implement a tool to list page headers

2023-06-30 Thread via GitHub
gszadovszky merged PR #1117: URL: https://github.com/apache/parquet-mr/pull/1117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet

[jira] [Resolved] (PARQUET-2318) Implement a tool to list page headers

2023-06-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2318. --- Resolution: Fixed > Implement a tool to list page headers > ---

[jira] [Commented] (PARQUET-2318) Implement a tool to list page headers

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738955#comment-17738955 ] ASF GitHub Bot commented on PARQUET-2318: - gszadovszky commented on PR #1117: U

[GitHub] [parquet-mr] gszadovszky commented on pull request #1117: PARQUET-2318: Implement a tool to list page headers

2023-06-30 Thread via GitHub
gszadovszky commented on PR #1117: URL: https://github.com/apache/parquet-mr/pull/1117#issuecomment-1614318162 Thanks for the review, @shangxinli! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-06-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738941#comment-17738941 ] ASF GitHub Bot commented on PARQUET-2249: - gszadovszky commented on code in PR

[GitHub] [parquet-format] gszadovszky commented on a diff in pull request #196: PARQUET-2249: Add nan_count to handle NaNs in statistics

2023-06-30 Thread via GitHub
gszadovszky commented on code in PR #196: URL: https://github.com/apache/parquet-format/pull/196#discussion_r1247575939 ## src/main/thrift/parquet.thrift: ## @@ -966,6 +985,23 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: op