[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792698#comment-17792698
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1838002383

   BTW, it would be good to add an interoperability test to read parquet files 
from here: 
https://github.com/apache/parquet-testing/commit/da467dac2f095b979af37bcf40fa0d1dee5ff652.
 You may want to take a look at this example: 
https://github.com/apache/parquet-mr/blob/44b56225be6fe7b74667f4f2430326ef1f076cc5/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestInteropReadLz4RawCodec.java#L40
 




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792667#comment-17792667
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1413455235


##
pom.xml:
##
@@ -596,6 +597,9 @@
 
[Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792628#comment-17792628
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1413353347


##
pom.xml:
##
@@ -596,6 +597,9 @@
 
[Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792621#comment-17792621
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1837800275

   > Could you please rebase it?
   
   Rebased, can you help merge this PR?




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791520#comment-17791520
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1833370284

   Could you please rebase it?




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-11-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789925#comment-17789925
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1827206658

   > @zhangjiashen This can be rebased to adopt parquet-format 2.10.0
   
   @wgtmac I just rebased with master branch and please help take a look when 
you get a chance?




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-11-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789297#comment-17789297
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1825081998

   @zhangjiashen This can be rebased to adopt parquet-format 2.10.0




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780598#comment-17780598
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1783745746

   > @wgtmac, I don't think we automatically deploy snapshot versions. And, we 
will need a final release of parquet-format anyway, before we can get this one 
merged.
   
   OK, then let's wait until format v2.10 is released. Once two PoC 
implementations of https://github.com/apache/parquet-format/pull/197 have been 
finished, I will kick off the release process.




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780482#comment-17780482
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

gszadovszky commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1783252917

   @wgtmac, I don't think we automatically deploy snapshot versions. And, we 
will need a final release of parquet-format anyway, before we can get this one 
merged.




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780446#comment-17780446
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1783165574

   https://github.com/apache/parquet-format/pull/184 is merged. Could you try 
to set `parquet.format.version` to 2.10.0-SNAPSHOT in the pom.xml and check if 
the CIs are green?




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779324#comment-17779324
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1371134589


##
parquet-column/src/main/java/org/apache/parquet/schema/Float16.java:
##
@@ -46,29 +46,10 @@
  * Ref: 
https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java
  */
 public class Float16 {

Review Comment:
   updated them to non-public, thanks!





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779323#comment-17779323
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1371132290


##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -150,26 +150,26 @@ public Float16Builder(PrimitiveType type) {
 
 @Override
 public Statistics build() {
-  Float16Statistics stats = (Float16Statistics) super.build();
+  BinaryStatistics stats = (BinaryStatistics) super.build();
   if (stats.hasNonNullValue()) {
 Binary bMin = stats.genericGetMin();
 Binary bMax = stats.genericGetMax();
 short min = bMin.get2BytesLittleEndian();
 short max = bMax.get2BytesLittleEndian();
 // Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
 if (Float16.isNaN(min) || Float16.isNaN(max)) {
-  bMin = 
Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN);
-  bMax = 
Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN);
+  bMin = Binary.fromConstantByteArray(new byte[] {0x00, 0x00});
+  bMax = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 0x80});

Review Comment:
   updated, thanks!





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778553#comment-17778553
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

gszadovszky commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1368247647


##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -150,26 +150,26 @@ public Float16Builder(PrimitiveType type) {
 
 @Override
 public Statistics build() {
-  Float16Statistics stats = (Float16Statistics) super.build();
+  BinaryStatistics stats = (BinaryStatistics) super.build();
   if (stats.hasNonNullValue()) {
 Binary bMin = stats.genericGetMin();
 Binary bMax = stats.genericGetMax();
 short min = bMin.get2BytesLittleEndian();
 short max = bMax.get2BytesLittleEndian();
 // Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
 if (Float16.isNaN(min) || Float16.isNaN(max)) {
-  bMin = 
Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN);
-  bMax = 
Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN);
+  bMin = Binary.fromConstantByteArray(new byte[] {0x00, 0x00});
+  bMax = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 0x80});
   stats.setMinMax(bMin, bMax);
   ((Statistics) stats).hasNonNullValue = false;
 } else {
   // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values 
would be skipped
-  if (min == Float16.POSITIVE_ZERO) {
-bMin = 
Binary.fromConstantByteArray(Float16.NEGATIVE_ZERO_BYTES_LITTLE_ENDIAN);
+  if (min == (short) 0x) {
+bMin = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 
0x80});

Review Comment:
   See above



##
parquet-column/src/main/java/org/apache/parquet/schema/Float16.java:
##
@@ -46,29 +46,10 @@
  * Ref: 
https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java
  */
 public class Float16 {

Review Comment:
   Please make any methods that are used only from the same package 
`package-private`. 



##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -150,26 +150,26 @@ public Float16Builder(PrimitiveType type) {
 
 @Override
 public Statistics build() {
-  Float16Statistics stats = (Float16Statistics) super.build();
+  BinaryStatistics stats = (BinaryStatistics) super.build();
   if (stats.hasNonNullValue()) {
 Binary bMin = stats.genericGetMin();
 Binary bMax = stats.genericGetMax();
 short min = bMin.get2BytesLittleEndian();
 short max = bMax.get2BytesLittleEndian();
 // Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
 if (Float16.isNaN(min) || Float16.isNaN(max)) {
-  bMin = 
Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN);
-  bMax = 
Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN);
+  bMin = Binary.fromConstantByteArray(new byte[] {0x00, 0x00});
+  bMax = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 0x80});
   stats.setMinMax(bMin, bMax);
   ((Statistics) stats).hasNonNullValue = false;
 } else {
   // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values 
would be skipped
-  if (min == Float16.POSITIVE_ZERO) {
-bMin = 
Binary.fromConstantByteArray(Float16.NEGATIVE_ZERO_BYTES_LITTLE_ENDIAN);
+  if (min == (short) 0x) {
+bMin = Binary.fromConstantByteArray(new byte[] {0x00, (byte) 
0x80});
 stats.setMinMax(bMin, bMax);
   }
-  if (max == Float16.NEGATIVE_ZERO) {
-bMax = 
Binary.fromConstantByteArray(Float16.POSITIVE_ZERO_BYTES_LITTLE_ENDIAN);
+  if (max == (short) 0x8000) {
+bMax = Binary.fromConstantByteArray(new byte[] {0x00, 0x00});

Review Comment:
   See above



##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -150,26 +150,26 @@ public Float16Builder(PrimitiveType type) {
 
 @Override
 public Statistics build() {
-  Float16Statistics stats = (Float16Statistics) super.build();
+  BinaryStatistics stats = (BinaryStatistics) super.build();
   if (stats.hasNonNullValue()) {
 Binary bMin = stats.genericGetMin();
 Binary bMax = stats.genericGetMax();
 short min = bMin.get2BytesLittleEndian();
 short max = bMax.get2BytesLittleEndian();
 // Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
 if (Float16.isNaN(min) || Float16.isNaN(max)) {
-

[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778276#comment-17778276
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1367832738


##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java:
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.column.statistics;
+
+import org.apache.parquet.schema.PrimitiveType;
+
+public class Float16Statistics extends BinaryStatistics {

Review Comment:
   you are correct, we can use BinaryStatistics directly and we don't need to 
have Float16Statistics.





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778275#comment-17778275
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1367832612


##
parquet-column/src/test/java/org/apache/parquet/io/api/TestBinary.java:
##
@@ -268,4 +268,19 @@ public void testCompare() {
 assertTrue(b1.compareTo(b3) == 0);
 assertTrue(b3.compareTo(b1) == 0);
   }
+
+  @Test
+  public void testGet2BytesLittleEndian() {

Review Comment:
   Added unit tests for this, thanks





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778274#comment-17778274
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1367832542


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,307 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.util.Arrays;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778273#comment-17778273
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1367832331


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776570#comment-17776570
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1363539290


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,307 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.util.Arrays;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776524#comment-17776524
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

gszadovszky commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1363371525


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,307 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.util.Arrays;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776255#comment-17776255
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1362333511


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,307 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.util.Arrays;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776254#comment-17776254
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1362335145


##
parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java:
##
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.statistics;
+
+import org.apache.parquet.Preconditions;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.GroupFactory;
+import org.apache.parquet.example.data.simple.SimpleGroupFactory;
+import org.apache.parquet.hadoop.ParquetFileReader;
+import org.apache.parquet.hadoop.ParquetWriter;
+import org.apache.parquet.internal.column.columnindex.ColumnIndex;
+import org.apache.parquet.io.api.Binary;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.Types;
+import org.apache.parquet.type.Float16;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.hadoop.example.ExampleParquetWriter;
+import org.apache.parquet.hadoop.example.GroupWriteSupport;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.hadoop.util.HadoopInputFile;
+
+import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type;
+import static 
org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY;
+import static org.junit.Assert.assertEquals;
+
+public class TestFloat16Statistics {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  private short[] valuesInAscendingOrder = {
+(short) 0xfc00, // -Infinity
+(short) 0xc000, // -2.0
+-Float16.MAX_VALUE, // -6.109476E-5
+Float16.NEGATIVE_ZERO, // -0
+Float16.POSITIVE_ZERO, // +0
+Float16.MIN_VALUE, // 5.9604645E-8
+Float16.MAX_VALUE, // 65504.0
+(short) 0x7c00}; // Infinity
+
+  private short[] valuesInAscendingOrderMinMax = {
+(short) 0xfc00, // -Infinity
+(short) 0x7c00}; // Infinity
+
+  private short[] valuesInDescendingOrder = {
+(short) 0x7c00, // Infinity
+Float16.MAX_VALUE, // 65504.0
+Float16.MIN_VALUE, // 5.9604645E-8
+Float16.POSITIVE_ZERO, // +0
+Float16.NEGATIVE_ZERO, // -0
+-Float16.MAX_VALUE, // -6.109476E-5
+(short) 0xc000, // -2.0
+(short) 0xfc00}; // -Infinity
+
+  private short[] valuesInDescendingOrderMinMax = {
+(short) 0xfc00, // -Infinity
+(short) 0x7c00}; // Infinity
+
+  private short[] valuesUndefinedOrder = {
+Float16.MAX_VALUE, // 65504.0
+(short) 0x7c00, // Infinity
+Float16.NEGATIVE_ZERO, // -0
+Float16.MIN_VALUE, // 5.9604645E-8
+Float16.POSITIVE_ZERO, // +0
+(short) 0xc000, // -2.0
+-Float16.MAX_VALUE, // -6.109476E-5
+(short) 0xfc00}; // -Infinity
+
+  private short[] valuesUndefinedOrderMinMax = {
+(short) 0xfc00, // -Infinity
+(short) 0x7c00}; // Infinity
+
+  private short[] valuesAllPositiveZero = {
+Float16.POSITIVE_ZERO, // +0
+Float16.POSITIVE_ZERO, // +0
+Float16.POSITIVE_ZERO, // +0
+Float16.POSITIVE_ZERO}; // +0
+
+  private short[] valuesAllPositiveZeroMinMax = {
+Float16.POSITIVE_ZERO, // +0
+Float16.POSITIVE_ZERO}; // +0
+
+  // Float16Statistics: Updating min to -0.0 to ensure that no 0.0 values 
would be skipped
+  private short[] valuesAllPositiveStatsZeroMinMax = {
+Float16.NEGATIVE_ZERO, // -0
+Float16.POSITIVE_ZERO}; // +0
+
+  private short[] valuesAllNegativeZero = {
+Float16.NEGATIVE_ZERO, // -0
+Float16.NEGATIVE_ZERO, // -0
+Float16.NEGATIVE_ZERO, // -0
+Float16.NEGATIVE_ZERO}; // -0
+
+  private sh

[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776244#comment-17776244
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1362272287


##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java:
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.column.statistics;
+
+import org.apache.parquet.schema.PrimitiveType;
+
+public class Float16Statistics extends BinaryStatistics {

Review Comment:
   I mean, why not directly use `BinaryStatistics` if `Float16Statistics` does 
not add any specific logic to it? Is anywhere relying on a `instanceof 
Float16Statistics` check?





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775330#comment-17775330
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1359678599


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775314#comment-17775314
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1359657237


##
parquet-column/src/test/java/org/apache/parquet/schema/TestTypeBuildersWithLogicalTypes.java:
##
@@ -205,10 +206,20 @@ public void testBinaryAnnotations() {
 }
   }
 
+  @Test
+  public void testFloat16Annotations() {
+LogicalTypeAnnotation[] types = new LogicalTypeAnnotation[] 
{float16Type()};
+for (final LogicalTypeAnnotation logicalType : types) {

Review Comment:
   delete necessary loop, thanks.





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775313#comment-17775313
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1359656883


##
parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.statistics;
+
+import org.apache.parquet.Preconditions;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.GroupFactory;
+import org.apache.parquet.example.data.simple.SimpleGroupFactory;
+import org.apache.parquet.hadoop.ParquetFileReader;
+import org.apache.parquet.hadoop.ParquetWriter;
+import org.apache.parquet.internal.column.columnindex.ColumnIndex;
+import org.apache.parquet.io.api.Binary;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.Types;
+import org.apache.parquet.type.Float16;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.hadoop.example.ExampleParquetWriter;
+import org.apache.parquet.hadoop.example.GroupWriteSupport;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.hadoop.util.HadoopInputFile;
+
+import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type;
+import static 
org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY;
+import static org.junit.Assert.assertEquals;
+
+public class TestFloat16Statistics {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  private short[] valuesInAscendingOrder = {
+(short) 0xfc00, // -Infinity
+(short) 0xc000, // -2.0
+-Float16.MAX_VALUE, // -6.109476E-5
+Float16.NEGATIVE_ZERO, // -0
+Float16.POSITIVE_ZERO, // +0
+Float16.MIN_VALUE, // 5.9604645E-8
+Float16.MAX_VALUE, // 65504.0
+(short) 0x7c00}; // Infinity
+
+  @Test
+  public void testFloat16ColumnIndex() throws IOException
+  {
+MessageType schema = Types.buildMessage().
+  
required(FIXED_LEN_BYTE_ARRAY).as(float16Type()).length(2).named("col_float16").named("msg");
+
+Configuration conf = new Configuration();
+GroupWriteSupport.setSchema(schema, conf);
+
+GroupFactory factory = new SimpleGroupFactory(schema);
+Path path = newTempPath();
+try (ParquetWriter writer = ExampleParquetWriter.builder(path)
+  .withConf(conf)
+  .withDictionaryEncoding(false)
+  .build()) {
+
+  for (short value : valuesInAscendingOrder) {
+writer.write(factory.newGroup().append("col_float16", 
Binary.fromConstantByteArray(Float16.toBytesLittleEndian(value;
+  }
+}
+
+try (ParquetFileReader reader = 
ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration( {
+
+  ColumnChunkMetaData column = 
reader.getFooter().getBlocks().get(0).getColumns().get(0);
+  ColumnIndex index = reader.readColumnIndex(column);
+  assertEquals(Collections.singletonList((short) 0xfc00), 
toFloat16List(index.getMinValues()));

Review Comment:
   good suggestions, added more tests





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e

[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775310#comment-17775310
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1359649367


##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java:
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.column.statistics;
+
+import org.apache.parquet.schema.PrimitiveType;
+
+public class Float16Statistics extends BinaryStatistics {

Review Comment:
   ```
 Float16Statistics(PrimitiveType type) {
   super(type);
 }
   ```
we need it here





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774349#comment-17774349
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

gszadovszky commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1356337298


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774291#comment-17774291
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1355951576


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774290#comment-17774290
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1355948744


##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -139,6 +140,43 @@ public Statistics build() {
 }
   }
 
+  // Builder for FLOAT16 type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class Float16Builder extends Builder {
+public Float16Builder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY;
+}
+
+@Override
+public Statistics build() {

Review Comment:
   Thanks for confirmation! @benibus 





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774215#comment-17774215
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

benibus commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1355712865


##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -139,6 +140,43 @@ public Statistics build() {
 }
   }
 
+  // Builder for FLOAT16 type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class Float16Builder extends Builder {
+public Float16Builder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY;
+}
+
+@Override
+public Statistics build() {

Review Comment:
   Sorry, missed this comment somehow. Yes, this looks correct.





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773134#comment-17773134
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

gszadovszky commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349905085


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773073#comment-17773073
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349815502


##
parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.statistics;
+
+import org.apache.parquet.Preconditions;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.GroupFactory;
+import org.apache.parquet.example.data.simple.SimpleGroupFactory;
+import org.apache.parquet.hadoop.ParquetFileReader;
+import org.apache.parquet.hadoop.ParquetWriter;
+import org.apache.parquet.internal.column.columnindex.ColumnIndex;
+import org.apache.parquet.io.api.Binary;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.Types;
+import org.apache.parquet.type.Float16;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.hadoop.example.ExampleParquetWriter;
+import org.apache.parquet.hadoop.example.GroupWriteSupport;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.hadoop.util.HadoopInputFile;
+
+import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type;
+import static 
org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY;
+import static org.junit.Assert.assertEquals;
+
+public class TestFloat16Statistics {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  private short[] valuesInAscendingOrder = {
+(short) 0xfc00, // -Infinity
+(short) 0xc000, // -2.0
+-Float16.MAX_VALUE, // -6.109476E-5
+Float16.NEGATIVE_ZERO, // -0
+Float16.POSITIVE_ZERO, // +0
+Float16.MIN_VALUE, // 5.9604645E-8
+Float16.MAX_VALUE, // 65504.0
+(short) 0x7c00}; // Infinity
+
+  @Test
+  public void testFloat16ColumnIndex() throws IOException
+  {
+MessageType schema = Types.buildMessage().
+  
required(FIXED_LEN_BYTE_ARRAY).as(float16Type()).length(2).named("col_float16").named("msg");
+
+Configuration conf = new Configuration();
+GroupWriteSupport.setSchema(schema, conf);
+
+GroupFactory factory = new SimpleGroupFactory(schema);
+Path path = newTempPath();
+try (ParquetWriter writer = ExampleParquetWriter.builder(path)
+  .withConf(conf)
+  .withDictionaryEncoding(false)
+  .build()) {
+
+  for (short value : valuesInAscendingOrder) {
+writer.write(factory.newGroup().append("col_float16", 
Binary.fromConstantByteArray(Float16.toBytesLittleEndian(value;
+  }
+}
+
+try (ParquetFileReader reader = 
ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration( {
+
+  ColumnChunkMetaData column = 
reader.getFooter().getBlocks().get(0).getColumns().get(0);
+  ColumnIndex index = reader.readColumnIndex(column);
+  assertEquals(Collections.singletonList((short) 0xfc00), 
toFloat16List(index.getMinValues()));

Review Comment:
   We also need to test these cases:
   - NaN values are present.
   - All values are +0 or -0.
   - Values are in ascending/descending/undefined order.





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arro

[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773072#comment-17773072
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349815242


##
parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.statistics;
+
+import org.apache.parquet.Preconditions;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.GroupFactory;
+import org.apache.parquet.example.data.simple.SimpleGroupFactory;
+import org.apache.parquet.hadoop.ParquetFileReader;
+import org.apache.parquet.hadoop.ParquetWriter;
+import org.apache.parquet.internal.column.columnindex.ColumnIndex;
+import org.apache.parquet.io.api.Binary;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.Types;
+import org.apache.parquet.type.Float16;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.hadoop.example.ExampleParquetWriter;
+import org.apache.parquet.hadoop.example.GroupWriteSupport;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.hadoop.util.HadoopInputFile;
+
+import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type;
+import static 
org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY;
+import static org.junit.Assert.assertEquals;
+
+public class TestFloat16Statistics {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  private short[] valuesInAscendingOrder = {
+(short) 0xfc00, // -Infinity
+(short) 0xc000, // -2.0
+-Float16.MAX_VALUE, // -6.109476E-5
+Float16.NEGATIVE_ZERO, // -0
+Float16.POSITIVE_ZERO, // +0
+Float16.MIN_VALUE, // 5.9604645E-8
+Float16.MAX_VALUE, // 65504.0
+(short) 0x7c00}; // Infinity
+
+  @Test
+  public void testFloat16ColumnIndex() throws IOException
+  {
+MessageType schema = Types.buildMessage().
+  
required(FIXED_LEN_BYTE_ARRAY).as(float16Type()).length(2).named("col_float16").named("msg");
+
+Configuration conf = new Configuration();
+GroupWriteSupport.setSchema(schema, conf);
+
+GroupFactory factory = new SimpleGroupFactory(schema);
+Path path = newTempPath();
+try (ParquetWriter writer = ExampleParquetWriter.builder(path)
+  .withConf(conf)
+  .withDictionaryEncoding(false)
+  .build()) {
+
+  for (short value : valuesInAscendingOrder) {
+writer.write(factory.newGroup().append("col_float16", 
Binary.fromConstantByteArray(Float16.toBytesLittleEndian(value;
+  }
+}
+
+try (ParquetFileReader reader = 
ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration( {
+
+  ColumnChunkMetaData column = 
reader.getFooter().getBlocks().get(0).getColumns().get(0);
+  ColumnIndex index = reader.readColumnIndex(column);
+  assertEquals(Collections.singletonList((short) 0xfc00), 
toFloat16List(index.getMinValues()));

Review Comment:
   It would be nice if we can verify the different boundary order of column 
index (like ascending, descending, etc.). To achieve this, we might need more 
values.





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supp

[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773071#comment-17773071
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349814639


##
parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestFloat16Statistics.java:
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.statistics;
+
+import org.apache.parquet.Preconditions;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.GroupFactory;
+import org.apache.parquet.example.data.simple.SimpleGroupFactory;
+import org.apache.parquet.hadoop.ParquetFileReader;
+import org.apache.parquet.hadoop.ParquetWriter;
+import org.apache.parquet.internal.column.columnindex.ColumnIndex;
+import org.apache.parquet.io.api.Binary;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.Types;
+import org.apache.parquet.type.Float16;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.hadoop.example.ExampleParquetWriter;
+import org.apache.parquet.hadoop.example.GroupWriteSupport;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.hadoop.util.HadoopInputFile;
+
+import static org.apache.parquet.schema.LogicalTypeAnnotation.float16Type;
+import static 
org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY;
+import static org.junit.Assert.assertEquals;
+
+public class TestFloat16Statistics {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  private short[] valuesInAscendingOrder = {
+(short) 0xfc00, // -Infinity
+(short) 0xc000, // -2.0
+-Float16.MAX_VALUE, // -6.109476E-5
+Float16.NEGATIVE_ZERO, // -0
+Float16.POSITIVE_ZERO, // +0
+Float16.MIN_VALUE, // 5.9604645E-8
+Float16.MAX_VALUE, // 65504.0
+(short) 0x7c00}; // Infinity
+
+  @Test
+  public void testFloat16ColumnIndex() throws IOException
+  {

Review Comment:
   ```suggestion
 public void testFloat16ColumnIndex() throws IOException {
   ```





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772837#comment-17772837
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1349533756


##
parquet-column/src/test/java/org/apache/parquet/schema/TestTypeBuildersWithLogicalTypes.java:
##
@@ -205,10 +206,20 @@ public void testBinaryAnnotations() {
 }
   }
 
+  @Test
+  public void testFloat16Annotations() {
+LogicalTypeAnnotation[] types = new LogicalTypeAnnotation[] 
{float16Type()};
+for (final LogicalTypeAnnotation logicalType : types) {

Review Comment:
   It seems that we don't need this for loop?



##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java:
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.column.statistics;
+
+import org.apache.parquet.schema.PrimitiveType;
+
+public class Float16Statistics extends BinaryStatistics {

Review Comment:
   If there is no override, do we actually need this subclass?



##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#

[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772099#comment-17772099
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

gszadovszky commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1346873547


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-10-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770836#comment-17770836
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1342092712


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769429#comment-17769429
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

gszadovszky commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1337285077


##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769386#comment-17769386
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1337924016


##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -139,6 +140,43 @@ public Statistics build() {
 }
   }
 
+  // Builder for FLOAT16 type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class Float16Builder extends Builder {
+public Float16Builder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY;

Review Comment:
   ```suggestion
 assert type.getPrimitiveTypeName() == 
PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY;
   ```



##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -139,6 +140,43 @@ public Statistics build() {
 }
   }
 
+  // Builder for FLOAT16 type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class Float16Builder extends Builder {
+public Float16Builder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY;
+}
+
+@Override
+public Statistics build() {
+  Float16Statistics stats = (Float16Statistics) super.build();
+  if (stats.hasNonNullValue()) {
+Binary bMin = stats.genericGetMin();
+Binary bMax = stats.genericGetMax();
+short min = Float16.fromBytesLittleEndian(bMin.getBytes());
+short max = Float16.fromBytesLittleEndian(bMax.getBytes());
+// Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
+if (Float16.isNaN(min) || Float16.isNaN(max)) {
+  bMin = 
Binary.fromConstantByteArray(Float16.toBytesLittleEndian(Float16.POSITIVE_ZERO));

Review Comment:
   It seems worth adding two static constants (of Binary type) to Float16 for 
POSITIVE_ZERO and NEGATIVE_ZERO as they are repeatedly constructed.



##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -226,6 +268,11 @@ public static Builder getBuilderForReading(PrimitiveType 
type) {
 return new FloatBuilder(type);
   case DOUBLE:
 return new DoubleBuilder(type);
+  case BINARY:

Review Comment:
   ```suggestion
 case FIXED_LEN_BYTE_ARRAY:
   ```



##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java:
##
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.column.statistics;
+
+import org.apache.parquet.schema.LogicalTypeAnnotation;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.Types;
+
+public class Float16Statistics extends BinaryStatistics
+{
+  // A fake type object to be used to generate the proper comparator
+  private static final PrimitiveType DEFAULT_FAKE_TYPE = 
Types.optional(PrimitiveType.PrimitiveTypeName.BINARY)
+
.named("fake_binary_float16_type").withLogicalTypeAnnotation(LogicalTypeAnnotation.float16Type());
+
+  /**
+   * @deprecated will be removed in 2.0.0. Use {@link 
Statistics#createStats(org.apache.parquet.schema.Type)} instead
+   */
+  @Deprecated

Review Comment:
   We shouldn't even add this if it is a deprecated one.



##
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##
@@ -139,6 +140,43 @@ public Statistics build() {
 }
   }
 
+  // Builder for FLOAT16 type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class Float16Builder extends Builder {
+public Float16Builder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY;

Review Comment:
   Please check the fixed length (2) as well.



##
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
###

[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769041#comment-17769041
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

wgtmac commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1734974224

   > @wgtmac @benibus please help revisit this PR once you get a chance?
   
   I will take a look later this week. cc @shangxinli @gszadovszky 




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768887#comment-17768887
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1734364109

   @wgtmac @benibus please help revisit this PR once you get a chance?




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768342#comment-17768342
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132433


##
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java:
##
@@ -448,4 +449,16 @@ private void appendHex(byte[] array, int offset, int 
length, StringBuilder build
   }
 }
   };
+
+  static final PrimitiveStringifier FLOAT16_STRINGIFIER = new 
BinaryStringifierBase("FLOAT16_STRINGIFIER") {
+
+@Override
+String stringifyNotNull(Binary value) {
+  if (value.length() != 2) {
+return BINARY_INVALID;
+  }
+  ByteBuffer buffer = value.toByteBuffer().order(ByteOrder.LITTLE_ENDIAN);
+  return 
DEFAULT_STRINGIFIER.stringify(toFloat(buffer.getShort(buffer.position(;

Review Comment:
   Added a Float16.toFloatString(..) method and please help check if it makes 
sense?





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768341#comment-17768341
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132367


##
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java:
##
@@ -448,4 +449,16 @@ private void appendHex(byte[] array, int offset, int 
length, StringBuilder build
   }
 }
   };
+
+  static final PrimitiveStringifier FLOAT16_STRINGIFIER = new 
BinaryStringifierBase("FLOAT16_STRINGIFIER") {
+
+@Override
+String stringifyNotNull(Binary value) {
+  if (value.length() != 2) {
+return BINARY_INVALID;

Review Comment:
   Creates InvalidFloat16ValueException and throw it instead of an invalid value





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768340#comment-17768340
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132272


##
parquet-common/src/main/java/org/apache/parquet/util/Float16.java:
##
@@ -0,0 +1,192 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.util;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768339#comment-17768339
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132034


##
parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java:
##
@@ -0,0 +1,89 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.parquet.util;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.apache.parquet.util.Float16.*;
+
+public class TestFloat16

Review Comment:
   Added more methods with tests!





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768338#comment-17768338
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131998


##
parquet-common/src/main/java/org/apache/parquet/util/Float16.java:
##
@@ -0,0 +1,192 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.util;

Review Comment:
   make sense, moved!





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768337#comment-17768337
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131904


##
parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java:
##
@@ -990,6 +990,30 @@ private void testUseStatsWithSignedSortOrder(StatsHelper 
helper) {
 }
   }
 
+  @Test
+  public void testFloat16Stats() {
+BinaryStatistics bStats = new BinaryStatistics();

Review Comment:
   Added a **Float16Statistics** and **Float16Builder** in **Statistics** , 
please check if it makes sense?





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768336#comment-17768336
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131904


##
parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java:
##
@@ -990,6 +990,30 @@ private void testUseStatsWithSignedSortOrder(StatsHelper 
helper) {
 }
   }
 
+  @Test
+  public void testFloat16Stats() {
+BinaryStatistics bStats = new BinaryStatistics();

Review Comment:
   Added a **Float16Statistics** and **Float16Builder** in **Statistics** , 
please check if it makes sense





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768335#comment-17768335
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131781


##
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveComparator.java:
##
@@ -276,4 +279,24 @@ public String toString() {
   return "BINARY_AS_SIGNED_INTEGER_COMPARATOR";
 }
   };
+
+  /**
+   * This comparator is for comparing two float16 values represented in 2 
bytes binary.
+   */
+  static final PrimitiveComparator BINARY_AS_FLOAT16_COMPARATOR = new 
BinaryComparator() {

Review Comment:
   added a test for the comparator!





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768334#comment-17768334
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131705


##
parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java:
##
@@ -0,0 +1,89 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.parquet.util;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.apache.parquet.util.Float16.*;
+
+public class TestFloat16
+{
+  @Test
+  public void testFloat16ToFloat() {
+// Zeroes, NaN and infinities
+assertEquals(0.0f, toFloat(toFloat16(0.0f)), 0.0f);
+assertEquals(-0.0f, toFloat(toFloat16(-0.0f)), 0.0f);
+assertEquals(Float.NaN, toFloat(toFloat16(Float.NaN)), 0.0f);
+assertEquals(Float.POSITIVE_INFINITY, 
toFloat(toFloat16(Float.POSITIVE_INFINITY)), 0.0f);
+assertEquals(Float.NEGATIVE_INFINITY, 
toFloat(toFloat16(Float.NEGATIVE_INFINITY)), 0.0f);
+// Known values
+assertEquals(1.0009765625f, toFloat(toFloat16(1.0009765625f)), 0.0f);
+assertEquals(-2.0f, toFloat(toFloat16(-2.0f)), 0.0f);
+assertEquals(6.1035156e-5f, toFloat(toFloat16(6.10352e-5f)), 0.0f); // 
Inexact
+assertEquals(65504.0f, toFloat(toFloat16(65504.0f)), 0.0f);
+assertEquals(0.33325195f, toFloat(toFloat16(1.0f / 3.0f)), 0.0f); // 
Inexact
+// Denormals (flushed to +/-0)
+assertEquals(6.097555e-5f, toFloat(toFloat16(6.09756e-5f)), 0.0f);
+assertEquals(5.9604645e-8f, toFloat(toFloat16(5.96046e-8f)), 0.0f);
+assertEquals(-6.097555e-5f, toFloat(toFloat16(-6.09756e-5f)), 0.0f);
+assertEquals(-5.9604645e-8f, toFloat(toFloat16(-5.96046e-8f)), 0.0f);
+  }
+
+  @Test
+  public void testFloatToFloat16() {
+// Zeroes, NaN and infinities
+assertEquals(POSITIVE_ZERO, toFloat16(0.0f));
+assertEquals(NEGATIVE_ZERO, toFloat16(-0.0f));
+assertEquals(NaN, toFloat16(Float.NaN));

Review Comment:
   Good suggestion, added!





> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768198#comment-17768198
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

zhangjiashen commented on PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1732203105

   > CI failures are likely due to the fact that the addition of the logical 
type to parquet-format is unmerged, so the specific [PR 
branch](https://github.com/apache/parquet-format/pull/184) needs to be manually 
installed for the build to pass. I'm not sure if there's a good solution yet, 
as this implementation needs to be present for said parquet-format PR to be 
voted on and merged.
   
   Agree! we can merge [PR](https://github.com/apache/parquet-format/pull/184) 
first after this diff is ready




> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-09-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767210#comment-17767210
 ] 

ASF GitHub Bot commented on PARQUET-1647:
-

benibus commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1331975632


##
parquet-common/src/main/java/org/apache/parquet/util/Float16.java:
##
@@ -0,0 +1,192 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.util;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * https://en.wikipedia.org/wiki/Half-precision_floating-point_format";>IEEE 
754
+ * floating point data types (also called fp16 or binary16). A half-precision 
float can be
+ * created from or converted to single-precision floats, and is stored in a 
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * 
+ * Sign bit: 1 bit
+ * Exponent width: 5 bits
+ * Significand: 10 bits
+ * 
+ *
+ * The format is laid out as follows:
+ * 
+ * 1   1   11
+ * ^   --^

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-07-26 Thread Freddy Fostvedt (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747747#comment-17747747
 ] 

Freddy Fostvedt commented on PARQUET-1647:
--

Thanks for putting effort into this [~benpharkins] , this is a very valuable 
piece at my place of work. It will save very significant costs on data 
processing / training cost if we can reduce memory usage.

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2023-07-25 Thread Ben Harkins (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747168#comment-17747168
 ] 

Ben Harkins commented on PARQUET-1647:
--

I'm currently working on this, so feel free to assign me. Although it's 
probably worth mentioning that the current plan is to implement this as a 
logical type in accordance with the proposal PR for 
[PARQUET-758|https://issues.apache.org/jira/browse/PARQUET-758], which deviates 
from some of the plan in this issue's description.

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2022-11-14 Thread JAVIER ANDRES RECASENS SANCHEZ (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634092#comment-17634092
 ] 

JAVIER ANDRES RECASENS SANCHEZ commented on PARQUET-1647:
-

[~the_alchemist] thanks for the update.

 

Is there anyone that could help with this?

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2022-04-25 Thread The Alchemist (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527761#comment-17527761
 ] 

The Alchemist commented on PARQUET-1647:


[~jrecasens] , [~orecoupa] :

Unfortunately, I have moved on and don't have the time to work float16 Parquet 
support.

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2022-04-25 Thread Orestis (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527758#comment-17527758
 ] 

Orestis commented on PARQUET-1647:
--

[~the_alchemist] Thank you for the initiative. Is there any update for this 
issue? 

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (PARQUET-1647) [Java] support for Arrow's float16

2022-04-25 Thread JAVIER ANDRES RECASENS SANCHEZ (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527757#comment-17527757
 ] 

JAVIER ANDRES RECASENS SANCHEZ commented on PARQUET-1647:
-

Any updates regarding this issue? We are very interested in float16 support. 
Thanks!

> [Java] support for Arrow's float16
> --
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-thrift
>Reporter: The Alchemist
>Priority: Minor
>
> h2. DESCRIPTION
>  
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type 
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g., 
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet 
> in-general.
>  
> h2. PLANS
> I'm able to spend some time on this, if someone points me  in the right 
> direction.
>  
>  # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming 
> convention?) to 
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
>  # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
>  # Add {{HALFFLOAT}} support to 
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
>  # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
>  # ??
> If anyone has any interest in this, pointers, or comments, they would be 
> greatly appreciated!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)