Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]
tustvold commented on PR #6953: URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2590154743 > I feel like there must be semantic difference between comparing Nans using .eq and comparing the bit patterns We generally use floating point total ordering, as described [here](https://docs.rs/arrow-ord/latest/arrow_ord/cmp/fn.gt.html). This is equivalent to byte-ordering when it comes to NaNs. As such IMO this is a bug fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]
alamb commented on PR #6953: URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2590103438 > > I think technically speaking this could be a behavior change as previously different Nan representations would be collapsed into a single dictionary entry, but now they will use different entries. > > I don't think this is true as the `NaN == NaN` test being false would have caused all NaN values to get separate entries in the dictionary, so behaviour shouldn't have changed. But it does differ from C++ Arrow which does collapse all NaNs to a single entry. I feel like there must be semantic difference between comparing Nans using .eq and comparing the bit patterns 🤔 However, I suppose at this point it is just an intellectual execise 🤷 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]
adamreeve commented on PR #6953: URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2586516918 > I think technically speaking this could be a behavior change as previously different Nan representations would be collapsed into a single dictionary entry, but now they will use different entries. I don't think this is true as the `NaN == NaN` test being false would have caused all NaN values to get separate entries in the dictionary, so behaviour shouldn't have changed. But it does differ from C++ Arrow which does collapse all NaNs to a single entry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]
tustvold merged PR #6953: URL: https://github.com/apache/arrow-rs/pull/6953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]
alamb commented on PR #6953: URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2583852964 FYI @etseidl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org