[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2023-02-12 Thread via GitHub


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1427471772

   > As @julienledem mentioned in the email discussion, let's have the 
corresponding PRs for support in the Java and C++ implementation ready before 
we merge this PR. We would like to have implementation support when the new 
type is released.
   
   It might have missed it but I didn't see Julien's reply on the dev mailing 
list.  This seems reasonable though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2023-02-01 Thread via GitHub


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1412470881

   @shangxinli are there guidelines for what needs to happen to accept this 
addition?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-12-07 Thread GitBox


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1341697557

   @anjakefala  IIUC, I think the prior objection was around not properly 
floating point sorting for statistics correctly.  I think the main thing is to 
update the specification to say this requires the same sorting logic as float32 
and float64.  I need to rereview the current state of things, but then I think 
we can probably try to vote on the mailing list to see if this type is 
acceptable.  I'm not sure on the exact process here (I don't know if 
implementations are required before a vote).  @gszadovszky thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-12-07 Thread GitBox


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1341351021

   Sorry for the delay but PARQUET-1222 has now been merged, so I think this is 
unblocked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-09-29 Thread GitBox


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1263116115

   Sorry for the delay, it sounds like PARQUET-1222 is blocker, let me make a 
proposal there and see if we can at least come to consensus on approach and 
maybe this feature can be the first test-case for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-09-03 Thread GitBox


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1236263820

   I think Parquet C++ probably has the same issue.  Let me reread 
[PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222). to see what 
the current state is and if we can push it forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1232420353

   > t is not that trivial. For the half-precision floating point numbers we do 
not have native support for either cpp or java so we can define the total 
ordering as we want. But we shall do the same for the existing floating point 
numbers that most languages have native support. Even though they are following 
the same standard the total ordering either does not exist or have different 
implementations. See 
[PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222) for details.
   
   I think these are orthogonal.  I might be missing something but it seems 
like it would not be to hard to case float16 to float in java/cpp and do the 
comparison in that space and cast it back down.  This might not be the most 
efficient implementation but would be straightforward? I am probably missing 
something.  It would be nice to resolve 
[PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222) so the same 
semantics would apply to all floating point numbers.  
   
   > The tricky thing will be the implementations. Even though parquet-mr does 
not really care about converting the values according to their logical types we 
still need to care about the logical types at the ordering (min/max values in 
the statistics).
   
   It seems this would require parquet implementations to null out statistics 
for logical types that they don't support, does parquet-mr do that today?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231187345

   It isn't clear to me if this should be a logical type or a physical type. We 
would need understand if there is different handling for forward compatibility 
purposes (what do we want the desired behavior to be be).  I think C++ might be 
lenient here, but don't know about parquet-mr @gszadovszky thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox


emkornfield commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231185256

   We should probably specify that using the [Byte Split 
Encodings](https://github.com/apache/parquet-format/blob/master/Encodings.md#byte-stream-split-byte_stream_split--9)
 can be used for this type as well?
   
   Also, in general, if possible try to avoid force pushing, as it makes it 
harder to compare iterative changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org