[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1427471772 > As @julienledem mentioned in the email discussion, let's have the corresponding PRs for support in the Java and C++ implementation ready before we merge this PR. We would like to have implementation support when the new type is released. It might have missed it but I didn't see Julien's reply on the dev mailing list. This seems reasonable though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1412470881 @shangxinli are there guidelines for what needs to happen to accept this addition? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1341697557 @anjakefala IIUC, I think the prior objection was around not properly floating point sorting for statistics correctly. I think the main thing is to update the specification to say this requires the same sorting logic as float32 and float64. I need to rereview the current state of things, but then I think we can probably try to vote on the mailing list to see if this type is acceptable. I'm not sure on the exact process here (I don't know if implementations are required before a vote). @gszadovszky thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1341351021 Sorry for the delay but PARQUET-1222 has now been merged, so I think this is unblocked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1263116115 Sorry for the delay, it sounds like PARQUET-1222 is blocker, let me make a proposal there and see if we can at least come to consensus on approach and maybe this feature can be the first test-case for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1236263820 I think Parquet C++ probably has the same issue. Let me reread [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222). to see what the current state is and if we can push it forward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1232420353 > t is not that trivial. For the half-precision floating point numbers we do not have native support for either cpp or java so we can define the total ordering as we want. But we shall do the same for the existing floating point numbers that most languages have native support. Even though they are following the same standard the total ordering either does not exist or have different implementations. See [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222) for details. I think these are orthogonal. I might be missing something but it seems like it would not be to hard to case float16 to float in java/cpp and do the comparison in that space and cast it back down. This might not be the most efficient implementation but would be straightforward? I am probably missing something. It would be nice to resolve [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222) so the same semantics would apply to all floating point numbers. > The tricky thing will be the implementations. Even though parquet-mr does not really care about converting the values according to their logical types we still need to care about the logical types at the ordering (min/max values in the statistics). It seems this would require parquet implementations to null out statistics for logical types that they don't support, does parquet-mr do that today? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231187345 It isn't clear to me if this should be a logical type or a physical type. We would need understand if there is different handling for forward compatibility purposes (what do we want the desired behavior to be be). I think C++ might be lenient here, but don't know about parquet-mr @gszadovszky thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231185256 We should probably specify that using the [Byte Split Encodings](https://github.com/apache/parquet-format/blob/master/Encodings.md#byte-stream-split-byte_stream_split--9) can be used for this type as well? Also, in general, if possible try to avoid force pushing, as it makes it harder to compare iterative changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org