[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792698#comment-17792698 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1838002383 BTW, it would be good to add an interoperability test to read parquet files from here: https://github.com/apache/parquet-testing/commit/da467dac2f095b979af37bcf40fa0d1dee5ff652. You may want to take a look at this example: https://github.com/apache/parquet-mr/blob/44b56225be6fe7b74667f4f2430326ef1f076cc5/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestInteropReadLz4RawCodec.java#L40 > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792667#comment-17792667 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1413455235 ## pom.xml: ## @@ -596,6 +597,9 @@ [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792628#comment-17792628 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1413353347 ## pom.xml: ## @@ -596,6 +597,9 @@ [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792621#comment-17792621 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1837800275 > Could you please rebase it? Rebased, can you help merge this PR? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791520#comment-17791520 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1833370284 Could you please rebase it? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789925#comment-17789925 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1827206658 > @zhangjiashen This can be rebased to adopt parquet-format 2.10.0 @wgtmac I just rebased with master branch and please help take a look when you get a chance? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789297#comment-17789297 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1825081998 @zhangjiashen This can be rebased to adopt parquet-format 2.10.0 > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780598#comment-17780598 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1783745746 > @wgtmac, I don't think we automatically deploy snapshot versions. And, we will need a final release of parquet-format anyway, before we can get this one merged. OK, then let's wait until format v2.10 is released. Once two PoC implementations of https://github.com/apache/parquet-format/pull/197 have been finished, I will kick off the release process. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780482#comment-17780482 ] ASF GitHub Bot commented on PARQUET-1647: - gszadovszky commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1783252917 @wgtmac, I don't think we automatically deploy snapshot versions. And, we will need a final release of parquet-format anyway, before we can get this one merged. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780446#comment-17780446 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1783165574 https://github.com/apache/parquet-format/pull/184 is merged. Could you try to set `parquet.format.version` to 2.10.0-SNAPSHOT in the pom.xml and check if the CIs are green? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779324#comment-17779324 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1371134589 ## parquet-column/src/main/java/org/apache/parquet/schema/Float16.java: ## @@ -46,29 +46,10 @@ * Ref: https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java */ public class Float16 { Review Comment: updated them to non-public, thanks! > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779323#comment-17779323 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1371132290 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -150,26 +150,26 @@ public Float16Builder(PrimitiveType type) { @Override public Statistics build() { - Float16Statistics stats = (Float16Statistics) super.build(); + BinaryStatistics stats = (BinaryStatistics) super.build(); if (stats.hasNonNullValue()) { Binary bMin = stats.genericGetMin(); Binary bMax = stats.genericGetMax(); short min = bMin.get2BytesLittleEndian(); short max = bMax.get2BytesLittleEndian(); // Drop min/max values in case of NaN as the sorting order of values is undefined for this case if (Float16.isNaN(min) || Float16.isNaN(max)) { - bMin = Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN); - bMax = Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN); + bMin = Binary.fromConstantByteArray(new byte[] {0x00, 0x00}); + bMax = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 0x80}); Review Comment: updated, thanks! > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778553#comment-17778553 ] ASF GitHub Bot commented on PARQUET-1647: - gszadovszky commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1368247647 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -150,26 +150,26 @@ public Float16Builder(PrimitiveType type) { @Override public Statistics build() { - Float16Statistics stats = (Float16Statistics) super.build(); + BinaryStatistics stats = (BinaryStatistics) super.build(); if (stats.hasNonNullValue()) { Binary bMin = stats.genericGetMin(); Binary bMax = stats.genericGetMax(); short min = bMin.get2BytesLittleEndian(); short max = bMax.get2BytesLittleEndian(); // Drop min/max values in case of NaN as the sorting order of values is undefined for this case if (Float16.isNaN(min) || Float16.isNaN(max)) { - bMin = Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN); - bMax = Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN); + bMin = Binary.fromConstantByteArray(new byte[] {0x00, 0x00}); + bMax = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 0x80}); stats.setMinMax(bMin, bMax); ((Statistics) stats).hasNonNullValue = false; } else { // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values would be skipped - if (min == Float16.POSITIVE_ZERO) { -bMin = Binary.fromConstantByteArray(Float16.NEGATIVE_ZERO_BYTES_LITTLE_ENDIAN); + if (min == (short) 0x) { +bMin = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 0x80}); Review Comment: See above ## parquet-column/src/main/java/org/apache/parquet/schema/Float16.java: ## @@ -46,29 +46,10 @@ * Ref: https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java */ public class Float16 { Review Comment: Please make any methods that are used only from the same package `package-private`. ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -150,26 +150,26 @@ public Float16Builder(PrimitiveType type) { @Override public Statistics build() { - Float16Statistics stats = (Float16Statistics) super.build(); + BinaryStatistics stats = (BinaryStatistics) super.build(); if (stats.hasNonNullValue()) { Binary bMin = stats.genericGetMin(); Binary bMax = stats.genericGetMax(); short min = bMin.get2BytesLittleEndian(); short max = bMax.get2BytesLittleEndian(); // Drop min/max values in case of NaN as the sorting order of values is undefined for this case if (Float16.isNaN(min) || Float16.isNaN(max)) { - bMin = Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN); - bMax = Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN); + bMin = Binary.fromConstantByteArray(new byte[] {0x00, 0x00}); + bMax = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 0x80}); stats.setMinMax(bMin, bMax); ((Statistics) stats).hasNonNullValue = false; } else { // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values would be skipped - if (min == Float16.POSITIVE_ZERO) { -bMin = Binary.fromConstantByteArray(Float16.NEGATIVE_ZERO_BYTES_LITTLE_ENDIAN); + if (min == (short) 0x) { +bMin = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 0x80}); stats.setMinMax(bMin, bMax); } - if (max == Float16.NEGATIVE_ZERO) { -bMax = Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN); + if (max == (short) 0x8000) { +bMax = Binary.fromConstantByteArray(new byte[] {0x00, 0x00}); Review Comment: See above ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -150,26 +150,26 @@ public Float16Builder(PrimitiveType type) { @Override public Statistics build() { - Float16Statistics stats = (Float16Statistics) super.build(); + BinaryStatistics stats = (BinaryStatistics) super.build(); if (stats.hasNonNullValue()) { Binary bMin = stats.genericGetMin(); Binary bMax = stats.genericGetMax(); short min = bMin.get2BytesLittleEndian(); short max = bMax.get2BytesLittleEndian(); // Drop min/max values in case of NaN as the sorting order of values is undefined for this case if (Float16.isNaN(min) || Float16.isNaN(max)) { -
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778276#comment-17778276 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1367832738 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.column.statistics; + +import org.apache.parquet.schema.PrimitiveType; + +public class Float16Statistics extends BinaryStatistics { Review Comment: you are correct, we can use BinaryStatistics directly and we don't need to have Float16Statistics. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778275#comment-17778275 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1367832612 ## parquet-column/src/test/java/org/apache/parquet/io/api/TestBinary.java: ## @@ -268,4 +268,19 @@ public void testCompare() { assertTrue(b1.compareTo(b3) == 0); assertTrue(b3.compareTo(b1) == 0); } + + @Test + public void testGet2BytesLittleEndian() { Review Comment: Added unit tests for this, thanks > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778274#comment-17778274 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1367832542 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,307 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.util.Arrays; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778273#comment-17778273 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1367832331 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776570#comment-17776570 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1363539290 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,307 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.util.Arrays; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776524#comment-17776524 ] ASF GitHub Bot commented on PARQUET-1647: - gszadovszky commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1363371525 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,307 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.util.Arrays; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776255#comment-17776255 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1362333511 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,307 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.util.Arrays; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776254#comment-17776254 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1362335145 ## parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java: ## @@ -0,0 +1,272 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.statistics; + +import org.apache.parquet.Preconditions; +import org.apache.parquet.example.data.Group; +import org.apache.parquet.example.data.GroupFactory; +import org.apache.parquet.example.data.simple.SimpleGroupFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.ParquetWriter; +import org.apache.parquet.internal.column.columnindex.ColumnIndex; +import org.apache.parquet.io.api.Binary; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.Types; +import org.apache.parquet.type.Float16; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.column.statistics.Statistics; +import org.apache.parquet.hadoop.example.ExampleParquetWriter; +import org.apache.parquet.hadoop.example.GroupWriteSupport; +import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData; +import org.apache.parquet.hadoop.util.HadoopInputFile; + +import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY; +import static org.junit.Assert.assertEquals; + +public class TestFloat16Statistics { + + @Rule + public TemporaryFolder temp = new TemporaryFolder(); + + private short[] valuesInAscendingOrder = { +(short) 0xfc00, // -Infinity +(short) 0xc000, // -2.0 +-Float16.MAX_VALUE, // -6.109476E-5 +Float16.NEGATIVE_ZERO, // -0 +Float16.POSITIVE_ZERO, // +0 +Float16.MIN_VALUE, // 5.9604645E-8 +Float16.MAX_VALUE, // 65504.0 +(short) 0x7c00}; // Infinity + + private short[] valuesInAscendingOrderMinMax = { +(short) 0xfc00, // -Infinity +(short) 0x7c00}; // Infinity + + private short[] valuesInDescendingOrder = { +(short) 0x7c00, // Infinity +Float16.MAX_VALUE, // 65504.0 +Float16.MIN_VALUE, // 5.9604645E-8 +Float16.POSITIVE_ZERO, // +0 +Float16.NEGATIVE_ZERO, // -0 +-Float16.MAX_VALUE, // -6.109476E-5 +(short) 0xc000, // -2.0 +(short) 0xfc00}; // -Infinity + + private short[] valuesInDescendingOrderMinMax = { +(short) 0xfc00, // -Infinity +(short) 0x7c00}; // Infinity + + private short[] valuesUndefinedOrder = { +Float16.MAX_VALUE, // 65504.0 +(short) 0x7c00, // Infinity +Float16.NEGATIVE_ZERO, // -0 +Float16.MIN_VALUE, // 5.9604645E-8 +Float16.POSITIVE_ZERO, // +0 +(short) 0xc000, // -2.0 +-Float16.MAX_VALUE, // -6.109476E-5 +(short) 0xfc00}; // -Infinity + + private short[] valuesUndefinedOrderMinMax = { +(short) 0xfc00, // -Infinity +(short) 0x7c00}; // Infinity + + private short[] valuesAllPositiveZero = { +Float16.POSITIVE_ZERO, // +0 +Float16.POSITIVE_ZERO, // +0 +Float16.POSITIVE_ZERO, // +0 +Float16.POSITIVE_ZERO}; // +0 + + private short[] valuesAllPositiveZeroMinMax = { +Float16.POSITIVE_ZERO, // +0 +Float16.POSITIVE_ZERO}; // +0 + + // Float16Statistics: Updating min to -0.0 to ensure that no 0.0 values would be skipped + private short[] valuesAllPositiveStatsZeroMinMax = { +Float16.NEGATIVE_ZERO, // -0 +Float16.POSITIVE_ZERO}; // +0 + + private short[] valuesAllNegativeZero = { +Float16.NEGATIVE_ZERO, // -0 +Float16.NEGATIVE_ZERO, // -0 +Float16.NEGATIVE_ZERO, // -0 +Float16.NEGATIVE_ZERO}; // -0 + + private sh
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776244#comment-17776244 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1362272287 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.column.statistics; + +import org.apache.parquet.schema.PrimitiveType; + +public class Float16Statistics extends BinaryStatistics { Review Comment: I mean, why not directly use `BinaryStatistics` if `Float16Statistics` does not add any specific logic to it? Is anywhere relying on a `instanceof Float16Statistics` check? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775330#comment-17775330 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1359678599 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775314#comment-17775314 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1359657237 ## parquet-column/src/test/java/org/apache/parquet/schema/TestTypeBuildersWithLogicalTypes.java: ## @@ -205,10 +206,20 @@ public void testBinaryAnnotations() { } } + @Test + public void testFloat16Annotations() { +LogicalTypeAnnotation[] types = new LogicalTypeAnnotation[] {float16Type()}; +for (final LogicalTypeAnnotation logicalType : types) { Review Comment: delete necessary loop, thanks. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775313#comment-17775313 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1359656883 ## parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.statistics; + +import org.apache.parquet.Preconditions; +import org.apache.parquet.example.data.Group; +import org.apache.parquet.example.data.GroupFactory; +import org.apache.parquet.example.data.simple.SimpleGroupFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.ParquetWriter; +import org.apache.parquet.internal.column.columnindex.ColumnIndex; +import org.apache.parquet.io.api.Binary; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.Types; +import org.apache.parquet.type.Float16; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.column.statistics.Statistics; +import org.apache.parquet.hadoop.example.ExampleParquetWriter; +import org.apache.parquet.hadoop.example.GroupWriteSupport; +import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData; +import org.apache.parquet.hadoop.util.HadoopInputFile; + +import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY; +import static org.junit.Assert.assertEquals; + +public class TestFloat16Statistics { + + @Rule + public TemporaryFolder temp = new TemporaryFolder(); + + private short[] valuesInAscendingOrder = { +(short) 0xfc00, // -Infinity +(short) 0xc000, // -2.0 +-Float16.MAX_VALUE, // -6.109476E-5 +Float16.NEGATIVE_ZERO, // -0 +Float16.POSITIVE_ZERO, // +0 +Float16.MIN_VALUE, // 5.9604645E-8 +Float16.MAX_VALUE, // 65504.0 +(short) 0x7c00}; // Infinity + + @Test + public void testFloat16ColumnIndex() throws IOException + { +MessageType schema = Types.buildMessage(). + required(FIXED_LEN_BYTE_ARRAY).as(float16Type()).length(2).named("col_float16").named("msg"); + +Configuration conf = new Configuration(); +GroupWriteSupport.setSchema(schema, conf); + +GroupFactory factory = new SimpleGroupFactory(schema); +Path path = newTempPath(); +try (ParquetWriter writer = ExampleParquetWriter.builder(path) + .withConf(conf) + .withDictionaryEncoding(false) + .build()) { + + for (short value : valuesInAscendingOrder) { +writer.write(factory.newGroup().append("col_float16", Binary.fromConstantByteArray(Float16.toBytesLittleEndian(value; + } +} + +try (ParquetFileReader reader = ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration( { + + ColumnChunkMetaData column = reader.getFooter().getBlocks().get(0).getColumns().get(0); + ColumnIndex index = reader.readColumnIndex(column); + assertEquals(Collections.singletonList((short) 0xfc00), toFloat16List(index.getMinValues())); Review Comment: good suggestions, added more tests > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775310#comment-17775310 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1359649367 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.column.statistics; + +import org.apache.parquet.schema.PrimitiveType; + +public class Float16Statistics extends BinaryStatistics { Review Comment: ``` Float16Statistics(PrimitiveType type) { super(type); } ``` we need it here > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774349#comment-17774349 ] ASF GitHub Bot commented on PARQUET-1647: - gszadovszky commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1356337298 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774291#comment-17774291 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1355951576 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774290#comment-17774290 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1355948744 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -139,6 +140,43 @@ public Statistics build() { } } + // Builder for FLOAT16 type to handle special cases of min/max values like NaN, -0.0, and 0.0 + private static class Float16Builder extends Builder { +public Float16Builder(PrimitiveType type) { + super(type); + assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY; +} + +@Override +public Statistics build() { Review Comment: Thanks for confirmation! @benibus > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774215#comment-17774215 ] ASF GitHub Bot commented on PARQUET-1647: - benibus commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1355712865 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -139,6 +140,43 @@ public Statistics build() { } } + // Builder for FLOAT16 type to handle special cases of min/max values like NaN, -0.0, and 0.0 + private static class Float16Builder extends Builder { +public Float16Builder(PrimitiveType type) { + super(type); + assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY; +} + +@Override +public Statistics build() { Review Comment: Sorry, missed this comment somehow. Yes, this looks correct. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773134#comment-17773134 ] ASF GitHub Bot commented on PARQUET-1647: - gszadovszky commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349905085 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773073#comment-17773073 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349815502 ## parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.statistics; + +import org.apache.parquet.Preconditions; +import org.apache.parquet.example.data.Group; +import org.apache.parquet.example.data.GroupFactory; +import org.apache.parquet.example.data.simple.SimpleGroupFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.ParquetWriter; +import org.apache.parquet.internal.column.columnindex.ColumnIndex; +import org.apache.parquet.io.api.Binary; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.Types; +import org.apache.parquet.type.Float16; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.column.statistics.Statistics; +import org.apache.parquet.hadoop.example.ExampleParquetWriter; +import org.apache.parquet.hadoop.example.GroupWriteSupport; +import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData; +import org.apache.parquet.hadoop.util.HadoopInputFile; + +import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY; +import static org.junit.Assert.assertEquals; + +public class TestFloat16Statistics { + + @Rule + public TemporaryFolder temp = new TemporaryFolder(); + + private short[] valuesInAscendingOrder = { +(short) 0xfc00, // -Infinity +(short) 0xc000, // -2.0 +-Float16.MAX_VALUE, // -6.109476E-5 +Float16.NEGATIVE_ZERO, // -0 +Float16.POSITIVE_ZERO, // +0 +Float16.MIN_VALUE, // 5.9604645E-8 +Float16.MAX_VALUE, // 65504.0 +(short) 0x7c00}; // Infinity + + @Test + public void testFloat16ColumnIndex() throws IOException + { +MessageType schema = Types.buildMessage(). + required(FIXED_LEN_BYTE_ARRAY).as(float16Type()).length(2).named("col_float16").named("msg"); + +Configuration conf = new Configuration(); +GroupWriteSupport.setSchema(schema, conf); + +GroupFactory factory = new SimpleGroupFactory(schema); +Path path = newTempPath(); +try (ParquetWriter writer = ExampleParquetWriter.builder(path) + .withConf(conf) + .withDictionaryEncoding(false) + .build()) { + + for (short value : valuesInAscendingOrder) { +writer.write(factory.newGroup().append("col_float16", Binary.fromConstantByteArray(Float16.toBytesLittleEndian(value; + } +} + +try (ParquetFileReader reader = ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration( { + + ColumnChunkMetaData column = reader.getFooter().getBlocks().get(0).getColumns().get(0); + ColumnIndex index = reader.readColumnIndex(column); + assertEquals(Collections.singletonList((short) 0xfc00), toFloat16List(index.getMinValues())); Review Comment: We also need to test these cases: - NaN values are present. - All values are +0 or -0. - Values are in ascending/descending/undefined order. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arro
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773072#comment-17773072 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349815242 ## parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.statistics; + +import org.apache.parquet.Preconditions; +import org.apache.parquet.example.data.Group; +import org.apache.parquet.example.data.GroupFactory; +import org.apache.parquet.example.data.simple.SimpleGroupFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.ParquetWriter; +import org.apache.parquet.internal.column.columnindex.ColumnIndex; +import org.apache.parquet.io.api.Binary; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.Types; +import org.apache.parquet.type.Float16; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.column.statistics.Statistics; +import org.apache.parquet.hadoop.example.ExampleParquetWriter; +import org.apache.parquet.hadoop.example.GroupWriteSupport; +import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData; +import org.apache.parquet.hadoop.util.HadoopInputFile; + +import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY; +import static org.junit.Assert.assertEquals; + +public class TestFloat16Statistics { + + @Rule + public TemporaryFolder temp = new TemporaryFolder(); + + private short[] valuesInAscendingOrder = { +(short) 0xfc00, // -Infinity +(short) 0xc000, // -2.0 +-Float16.MAX_VALUE, // -6.109476E-5 +Float16.NEGATIVE_ZERO, // -0 +Float16.POSITIVE_ZERO, // +0 +Float16.MIN_VALUE, // 5.9604645E-8 +Float16.MAX_VALUE, // 65504.0 +(short) 0x7c00}; // Infinity + + @Test + public void testFloat16ColumnIndex() throws IOException + { +MessageType schema = Types.buildMessage(). + required(FIXED_LEN_BYTE_ARRAY).as(float16Type()).length(2).named("col_float16").named("msg"); + +Configuration conf = new Configuration(); +GroupWriteSupport.setSchema(schema, conf); + +GroupFactory factory = new SimpleGroupFactory(schema); +Path path = newTempPath(); +try (ParquetWriter writer = ExampleParquetWriter.builder(path) + .withConf(conf) + .withDictionaryEncoding(false) + .build()) { + + for (short value : valuesInAscendingOrder) { +writer.write(factory.newGroup().append("col_float16", Binary.fromConstantByteArray(Float16.toBytesLittleEndian(value; + } +} + +try (ParquetFileReader reader = ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration( { + + ColumnChunkMetaData column = reader.getFooter().getBlocks().get(0).getColumns().get(0); + ColumnIndex index = reader.readColumnIndex(column); + assertEquals(Collections.singletonList((short) 0xfc00), toFloat16List(index.getMinValues())); Review Comment: It would be nice if we can verify the different boundary order of column index (like ascending, descending, etc.). To achieve this, we might need more values. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supp
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773071#comment-17773071 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349814639 ## parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.statistics; + +import org.apache.parquet.Preconditions; +import org.apache.parquet.example.data.Group; +import org.apache.parquet.example.data.GroupFactory; +import org.apache.parquet.example.data.simple.SimpleGroupFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.ParquetWriter; +import org.apache.parquet.internal.column.columnindex.ColumnIndex; +import org.apache.parquet.io.api.Binary; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.Types; +import org.apache.parquet.type.Float16; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.Collections; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.column.statistics.Statistics; +import org.apache.parquet.hadoop.example.ExampleParquetWriter; +import org.apache.parquet.hadoop.example.GroupWriteSupport; +import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData; +import org.apache.parquet.hadoop.util.HadoopInputFile; + +import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY; +import static org.junit.Assert.assertEquals; + +public class TestFloat16Statistics { + + @Rule + public TemporaryFolder temp = new TemporaryFolder(); + + private short[] valuesInAscendingOrder = { +(short) 0xfc00, // -Infinity +(short) 0xc000, // -2.0 +-Float16.MAX_VALUE, // -6.109476E-5 +Float16.NEGATIVE_ZERO, // -0 +Float16.POSITIVE_ZERO, // +0 +Float16.MIN_VALUE, // 5.9604645E-8 +Float16.MAX_VALUE, // 65504.0 +(short) 0x7c00}; // Infinity + + @Test + public void testFloat16ColumnIndex() throws IOException + { Review Comment: ```suggestion public void testFloat16ColumnIndex() throws IOException { ``` > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772837#comment-17772837 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349533756 ## parquet-column/src/test/java/org/apache/parquet/schema/TestTypeBuildersWithLogicalTypes.java: ## @@ -205,10 +206,20 @@ public void testBinaryAnnotations() { } } + @Test + public void testFloat16Annotations() { +LogicalTypeAnnotation[] types = new LogicalTypeAnnotation[] {float16Type()}; +for (final LogicalTypeAnnotation logicalType : types) { Review Comment: It seems that we don't need this for loop? ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.column.statistics; + +import org.apache.parquet.schema.PrimitiveType; + +public class Float16Statistics extends BinaryStatistics { Review Comment: If there is no override, do we actually need this subclass? ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772099#comment-17772099 ] ASF GitHub Bot commented on PARQUET-1647: - gszadovszky commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1346873547 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770836#comment-17770836 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1342092712 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769429#comment-17769429 ] ASF GitHub Bot commented on PARQUET-1647: - gszadovszky commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1337285077 ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ## @@ -0,0 +1,339 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.type; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769386#comment-17769386 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1337924016 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -139,6 +140,43 @@ public Statistics build() { } } + // Builder for FLOAT16 type to handle special cases of min/max values like NaN, -0.0, and 0.0 + private static class Float16Builder extends Builder { +public Float16Builder(PrimitiveType type) { + super(type); + assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY; Review Comment: ```suggestion assert type.getPrimitiveTypeName() == PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY; ``` ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -139,6 +140,43 @@ public Statistics build() { } } + // Builder for FLOAT16 type to handle special cases of min/max values like NaN, -0.0, and 0.0 + private static class Float16Builder extends Builder { +public Float16Builder(PrimitiveType type) { + super(type); + assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY; +} + +@Override +public Statistics build() { + Float16Statistics stats = (Float16Statistics) super.build(); + if (stats.hasNonNullValue()) { +Binary bMin = stats.genericGetMin(); +Binary bMax = stats.genericGetMax(); +short min = Float16.fromBytesLittleEndian(bMin.getBytes()); +short max = Float16.fromBytesLittleEndian(bMax.getBytes()); +// Drop min/max values in case of NaN as the sorting order of values is undefined for this case +if (Float16.isNaN(min) || Float16.isNaN(max)) { + bMin = Binary.fromConstantByteArray(Float16.toBytesLittleEndian(Float16.POSITIVE_ZERO)); Review Comment: It seems worth adding two static constants (of Binary type) to Float16 for POSITIVE_ZERO and NEGATIVE_ZERO as they are repeatedly constructed. ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -226,6 +268,11 @@ public static Builder getBuilderForReading(PrimitiveType type) { return new FloatBuilder(type); case DOUBLE: return new DoubleBuilder(type); + case BINARY: Review Comment: ```suggestion case FIXED_LEN_BYTE_ARRAY: ``` ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.column.statistics; + +import org.apache.parquet.schema.LogicalTypeAnnotation; +import org.apache.parquet.schema.PrimitiveType; +import org.apache.parquet.schema.Types; + +public class Float16Statistics extends BinaryStatistics +{ + // A fake type object to be used to generate the proper comparator + private static final PrimitiveType DEFAULT_FAKE_TYPE = Types.optional(PrimitiveType.PrimitiveTypeName.BINARY) + .named("fake_binary_float16_type").withLogicalTypeAnnotation(LogicalTypeAnnotation.float16Type()); + + /** + * @deprecated will be removed in 2.0.0. Use {@link Statistics#createStats(org.apache.parquet.schema.Type)} instead + */ + @Deprecated Review Comment: We shouldn't even add this if it is a deprecated one. ## parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java: ## @@ -139,6 +140,43 @@ public Statistics build() { } } + // Builder for FLOAT16 type to handle special cases of min/max values like NaN, -0.0, and 0.0 + private static class Float16Builder extends Builder { +public Float16Builder(PrimitiveType type) { + super(type); + assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY; Review Comment: Please check the fixed length (2) as well. ## parquet-common/src/main/java/org/apache/parquet/type/Float16.java: ###
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769041#comment-17769041 ] ASF GitHub Bot commented on PARQUET-1647: - wgtmac commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1734974224 > @wgtmac @benibus please help revisit this PR once you get a chance? I will take a look later this week. cc @shangxinli @gszadovszky > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768887#comment-17768887 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1734364109 @wgtmac @benibus please help revisit this PR once you get a chance? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768342#comment-17768342 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132433 ## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java: ## @@ -448,4 +449,16 @@ private void appendHex(byte[] array, int offset, int length, StringBuilder build } } }; + + static final PrimitiveStringifier FLOAT16_STRINGIFIER = new BinaryStringifierBase("FLOAT16_STRINGIFIER") { + +@Override +String stringifyNotNull(Binary value) { + if (value.length() != 2) { +return BINARY_INVALID; + } + ByteBuffer buffer = value.toByteBuffer().order(ByteOrder.LITTLE_ENDIAN); + return DEFAULT_STRINGIFIER.stringify(toFloat(buffer.getShort(buffer.position(; Review Comment: Added a Float16.toFloatString(..) method and please help check if it makes sense? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768341#comment-17768341 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132367 ## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java: ## @@ -448,4 +449,16 @@ private void appendHex(byte[] array, int offset, int length, StringBuilder build } } }; + + static final PrimitiveStringifier FLOAT16_STRINGIFIER = new BinaryStringifierBase("FLOAT16_STRINGIFIER") { + +@Override +String stringifyNotNull(Binary value) { + if (value.length() != 2) { +return BINARY_INVALID; Review Comment: Creates InvalidFloat16ValueException and throw it instead of an invalid value > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768340#comment-17768340 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132272 ## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.util; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768339#comment-17768339 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132034 ## parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.util; + +import org.junit.Test; + +import static org.junit.Assert.assertEquals; +import static org.apache.parquet.util.Float16.*; + +public class TestFloat16 Review Comment: Added more methods with tests! > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768338#comment-17768338 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131998 ## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.util; Review Comment: make sense, moved! > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768337#comment-17768337 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131904 ## parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java: ## @@ -990,6 +990,30 @@ private void testUseStatsWithSignedSortOrder(StatsHelper helper) { } } + @Test + public void testFloat16Stats() { +BinaryStatistics bStats = new BinaryStatistics(); Review Comment: Added a **Float16Statistics** and **Float16Builder** in **Statistics** , please check if it makes sense? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768336#comment-17768336 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131904 ## parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java: ## @@ -990,6 +990,30 @@ private void testUseStatsWithSignedSortOrder(StatsHelper helper) { } } + @Test + public void testFloat16Stats() { +BinaryStatistics bStats = new BinaryStatistics(); Review Comment: Added a **Float16Statistics** and **Float16Builder** in **Statistics** , please check if it makes sense > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768335#comment-17768335 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131781 ## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveComparator.java: ## @@ -276,4 +279,24 @@ public String toString() { return "BINARY_AS_SIGNED_INTEGER_COMPARATOR"; } }; + + /** + * This comparator is for comparing two float16 values represented in 2 bytes binary. + */ + static final PrimitiveComparator BINARY_AS_FLOAT16_COMPARATOR = new BinaryComparator() { Review Comment: added a test for the comparator! > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768334#comment-17768334 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131705 ## parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.util; + +import org.junit.Test; + +import static org.junit.Assert.assertEquals; +import static org.apache.parquet.util.Float16.*; + +public class TestFloat16 +{ + @Test + public void testFloat16ToFloat() { +// Zeroes, NaN and infinities +assertEquals(0.0f, toFloat(toFloat16(0.0f)), 0.0f); +assertEquals(-0.0f, toFloat(toFloat16(-0.0f)), 0.0f); +assertEquals(Float.NaN, toFloat(toFloat16(Float.NaN)), 0.0f); +assertEquals(Float.POSITIVE_INFINITY, toFloat(toFloat16(Float.POSITIVE_INFINITY)), 0.0f); +assertEquals(Float.NEGATIVE_INFINITY, toFloat(toFloat16(Float.NEGATIVE_INFINITY)), 0.0f); +// Known values +assertEquals(1.0009765625f, toFloat(toFloat16(1.0009765625f)), 0.0f); +assertEquals(-2.0f, toFloat(toFloat16(-2.0f)), 0.0f); +assertEquals(6.1035156e-5f, toFloat(toFloat16(6.10352e-5f)), 0.0f); // Inexact +assertEquals(65504.0f, toFloat(toFloat16(65504.0f)), 0.0f); +assertEquals(0.33325195f, toFloat(toFloat16(1.0f / 3.0f)), 0.0f); // Inexact +// Denormals (flushed to +/-0) +assertEquals(6.097555e-5f, toFloat(toFloat16(6.09756e-5f)), 0.0f); +assertEquals(5.9604645e-8f, toFloat(toFloat16(5.96046e-8f)), 0.0f); +assertEquals(-6.097555e-5f, toFloat(toFloat16(-6.09756e-5f)), 0.0f); +assertEquals(-5.9604645e-8f, toFloat(toFloat16(-5.96046e-8f)), 0.0f); + } + + @Test + public void testFloatToFloat16() { +// Zeroes, NaN and infinities +assertEquals(POSITIVE_ZERO, toFloat16(0.0f)); +assertEquals(NEGATIVE_ZERO, toFloat16(-0.0f)); +assertEquals(NaN, toFloat16(Float.NaN)); Review Comment: Good suggestion, added! > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768198#comment-17768198 ] ASF GitHub Bot commented on PARQUET-1647: - zhangjiashen commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1732203105 > CI failures are likely due to the fact that the addition of the logical type to parquet-format is unmerged, so the specific [PR branch](https://github.com/apache/parquet-format/pull/184) needs to be manually installed for the build to pass. I'm not sure if there's a good solution yet, as this implementation needs to be present for said parquet-format PR to be voted on and merged. Agree! we can merge [PR](https://github.com/apache/parquet-format/pull/184) first after this diff is ready > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767210#comment-17767210 ] ASF GitHub Bot commented on PARQUET-1647: - benibus commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1331975632 ## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.util; + +/** + * The class is a utility class to manipulate half-precision 16-bit + * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 754 + * floating point data types (also called fp16 or binary16). A half-precision float can be + * created from or converted to single-precision floats, and is stored in a short data type. + * The IEEE 754 standard specifies an float16 as having the following format: + * + * Sign bit: 1 bit + * Exponent width: 5 bits + * Significand: 10 bits + * + * + * The format is laid out as follows: + * + * 1 1 11 + * ^ --^ > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747747#comment-17747747 ] Freddy Fostvedt commented on PARQUET-1647: -- Thanks for putting effort into this [~benpharkins] , this is a very valuable piece at my place of work. It will save very significant costs on data processing / training cost if we can reduce memory usage. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747168#comment-17747168 ] Ben Harkins commented on PARQUET-1647: -- I'm currently working on this, so feel free to assign me. Although it's probably worth mentioning that the current plan is to implement this as a logical type in accordance with the proposal PR for [PARQUET-758|https://issues.apache.org/jira/browse/PARQUET-758], which deviates from some of the plan in this issue's description. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634092#comment-17634092 ] JAVIER ANDRES RECASENS SANCHEZ commented on PARQUET-1647: - [~the_alchemist] thanks for the update. Is there anyone that could help with this? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527761#comment-17527761 ] The Alchemist commented on PARQUET-1647: [~jrecasens] , [~orecoupa] : Unfortunately, I have moved on and don't have the time to work float16 Parquet support. > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527758#comment-17527758 ] Orestis commented on PARQUET-1647: -- [~the_alchemist] Thank you for the initiative. Is there any update for this issue? > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16
[ https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527757#comment-17527757 ] JAVIER ANDRES RECASENS SANCHEZ commented on PARQUET-1647: - Any updates regarding this issue? We are very interested in float16 support. Thanks! > [Java] support for Arrow's float16 > -- > > Key: PARQUET-1647 > URL: https://issues.apache.org/jira/browse/PARQUET-1647 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-thrift >Reporter: The Alchemist >Priority: Minor > > h2. DESCRIPTION > > I'm wondering if there's any interest in supporting Arrow's {{float16}} type > in Parquet. > There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., > PARQUET-1403) but nothing that speaks to adding half-float support to Parquet > in-general. > > h2. PLANS > I'm able to spend some time on this, if someone points me in the right > direction. > > # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming > convention?) to > [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32] > # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}} > # Add {{HALFFLOAT}} support to > {{org.apache.parquet.arrow.schema.SchemaConverter}} > # Add encoding for new type at {{org.apache.parquet.column.Encoding}} > # ?? > If anyone has any interest in this, pointers, or comments, they would be > greatly appreciated! -- This message was sent by Atlassian Jira (v8.20.7#820007)