[jira] [Comment Edited] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-06-05 Thread Anja Boskovic (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729402#comment-17729402
 ] 

Anja Boskovic edited comment on PARQUET-758 at 6/5/23 6:47 PM:
---

Hi Gabor!

I would support a proposal for implementing bfloat16, maybe even as a canonical 
extension type in Arrow.

However, I have a hesitency to including that in this round of implementations. 
I think it should be considered seperately.

1. My understanding is that the implementations have already begun (I messaged 
the parties working on the implementations, to create appropriate tickets). 
2. It would prolong the format review and implementations.
3. Part of that prolonging is that I forsee additional back-and-forth over 
debating why "bfloat16"; why not tensorfloat? Why not add both?

And my experience has been that these conversations take a really long time for 
the Parquet community. It could easily add months to this process.

Float16 being an IEEE standard has a simplicity to its inclusion.

So, I guess my takeaway is that I support us opening a seperate format PR for 
bfloat16 inclusion, and having that occur seperate from the work of including, 
and implementing, IEEE float16.


was (Author: JIRAUSER288952):
Hi Gabor!

I would support a proposal for implementing bfloat16, maybe even as a canonical 
extension type in Arrow.

However, I have a hesitency to including that in this round of implementations. 
I think it should be considered seperately.

1. My understanding is that the implementations have already begun (I messaged 
the parties working on the implementations, to create appropriate tickets). 
2. This it will increase the time added to the format review and then to the 
implementations.
3. Part of that, I forsee additional back-and-forth over debating why 
"bfloat16"; why not tensorfloat? Why not add both?

And my experience has been that these conversations take a really long time for 
the Parquet community.

Float16 being an IEEE standard has a simplicity to its inclusion.

So, I guess my takeaway is that I support us opening a seperate format PR for 
bfloat16 inclusion, and having that occur seperate from the work of including, 
and implementing, IEEE float16.

> [Format] HALF precision FLOAT Logical type
> --
>
> Key: PARQUET-758
> URL: https://issues.apache.org/jira/browse/PARQUET-758
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Julien Le Dem
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-06-05 Thread Anja Boskovic (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729402#comment-17729402
 ] 

Anja Boskovic commented on PARQUET-758:
---

Hi Gabor!

I would support a proposal for implementing bfloat16, maybe even as a canonical 
extension type in Arrow.

However, I have a hesitency to including that in this round of implementations. 
I think it should be considered seperately. 

1. My understanding is that the implementations have already begun (I am 
messaging the parties working on the implementations, to create appropriate 
tickets). 
2. This it will increase the time added to the format review and then to the 
implementations.
3. Part of that, I forsee additional back-and-forth over debating why 
"bfloat16"; why not tensorfloat? Why not add both? 

And my experience has been that these conversations take a really long time for 
the Parquet community. 

Float16 being an IEEE standard has a simplicity to its inclusion. 

So, I guess my takeaway is that I support us opening a seperate format PR for 
bfloat16 inclusion, and having that occur seperate from the work of including, 
and implementing, IEEE float16.

> [Format] HALF precision FLOAT Logical type
> --
>
> Key: PARQUET-758
> URL: https://issues.apache.org/jira/browse/PARQUET-758
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Julien Le Dem
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-06-05 Thread Anja Boskovic (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729402#comment-17729402
 ] 

Anja Boskovic edited comment on PARQUET-758 at 6/5/23 6:26 PM:
---

Hi Gabor!

I would support a proposal for implementing bfloat16, maybe even as a canonical 
extension type in Arrow.

However, I have a hesitency to including that in this round of implementations. 
I think it should be considered seperately.

1. My understanding is that the implementations have already begun (I messaged 
the parties working on the implementations, to create appropriate tickets). 
2. This it will increase the time added to the format review and then to the 
implementations.
3. Part of that, I forsee additional back-and-forth over debating why 
"bfloat16"; why not tensorfloat? Why not add both?

And my experience has been that these conversations take a really long time for 
the Parquet community.

Float16 being an IEEE standard has a simplicity to its inclusion.

So, I guess my takeaway is that I support us opening a seperate format PR for 
bfloat16 inclusion, and having that occur seperate from the work of including, 
and implementing, IEEE float16.


was (Author: JIRAUSER288952):
Hi Gabor!

I would support a proposal for implementing bfloat16, maybe even as a canonical 
extension type in Arrow.

However, I have a hesitency to including that in this round of implementations. 
I think it should be considered seperately. 

1. My understanding is that the implementations have already begun (I am 
messaging the parties working on the implementations, to create appropriate 
tickets). 
2. This it will increase the time added to the format review and then to the 
implementations.
3. Part of that, I forsee additional back-and-forth over debating why 
"bfloat16"; why not tensorfloat? Why not add both? 

And my experience has been that these conversations take a really long time for 
the Parquet community. 

Float16 being an IEEE standard has a simplicity to its inclusion. 

So, I guess my takeaway is that I support us opening a seperate format PR for 
bfloat16 inclusion, and having that occur seperate from the work of including, 
and implementing, IEEE float16.

> [Format] HALF precision FLOAT Logical type
> --
>
> Key: PARQUET-758
> URL: https://issues.apache.org/jira/browse/PARQUET-758
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Julien Le Dem
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Feedback needed on PR #184 apache/parquet-format

2023-01-16 Thread Anja Boskovic

Hello!

I have a PR open proposing the addition of float-16/half-float logical
type in the parquet-format:
https://github.com/apache/parquet-format/pull/184 
<https://github.com/apache/parquet-format/pull/184>.


I am looking for feedback on what the next step is. Does the PR need an
additional round of reviews before I send a poll to the mailing list? If
it does, do you have advice on who I could ask for a review? Does an
implementation need to occur before the mailing list poll?

Thanks

~* Anja


Interest in adding the float16 logical type to the Parquet spec

2022-08-23 Thread Anja
Hello!

Is there interest in having the float16 logical type standardised in the
Parquet spec? I am proposing a PR for Arrow that will write float16 to
Parquet as FixedSizeBinary:https://issues.apache.org/jira/browse/ARROW-17464
but for the sake of portability between data analysis tools, it would of
course be a lot better to have this type standardised in the format itself.

Previous requests for this have been here:
https://issues.apache.org/jira/browse/PARQUET-1647 and here:
https://issues.apache.org/jira/browse/PARQUET-758 .

With the development of neural networks, half-precision floating points are
becoming more popular:
https://en.wikipedia.org/wiki/Half-precision_floating-point_format ; I do
think that a demand exists for its support. I am new to the project, but am
happy to contribute development time if there is support for this feature,
and guidance.

Warm regards,

Anja