Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]

2025-01-14 Thread via GitHub


tustvold commented on PR #6953:
URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2590154743

   > I feel like there must be semantic difference between comparing Nans using 
.eq and comparing the bit patterns
   
   We generally use floating point total ordering, as described 
[here](https://docs.rs/arrow-ord/latest/arrow_ord/cmp/fn.gt.html). This is 
equivalent to byte-ordering when it comes to NaNs. As such IMO this is a bug 
fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]

2025-01-14 Thread via GitHub


alamb commented on PR #6953:
URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2590103438

   > > I think technically speaking this could be a behavior change as 
previously different Nan representations would be collapsed into a single 
dictionary entry, but now they will use different entries.
   > 
   > I don't think this is true as the `NaN == NaN` test being false would have 
caused all NaN values to get separate entries in the dictionary, so behaviour 
shouldn't have changed. But it does differ from C++ Arrow which does collapse 
all NaNs to a single entry.
   
   I feel like there must be semantic difference between comparing Nans using 
.eq and comparing the bit patterns 🤔  
   
   However, I suppose at this point it is just an intellectual execise 🤷 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]

2025-01-13 Thread via GitHub


adamreeve commented on PR #6953:
URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2586516918

   > I think technically speaking this could be a behavior change as previously 
different Nan representations would be collapsed into a single dictionary 
entry, but now they will use different entries.
   
   I don't think this is true as the `NaN == NaN` test being false would have 
caused all NaN values to get separate entries in the dictionary, so behaviour 
shouldn't have changed. But it does differ from C++ Arrow which does collapse 
all NaNs to a single entry.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]

2025-01-11 Thread via GitHub


tustvold merged PR #6953:
URL: https://github.com/apache/arrow-rs/pull/6953


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Parquet] Improve speed of dictionary encoding NaN float values [arrow-rs]

2025-01-10 Thread via GitHub


alamb commented on PR #6953:
URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2583852964

   FYI @etseidl 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org