[GitHub] [arrow] Dandandan commented on pull request #9596: ARROW-11495: [Rust] Better numerical_coercion

2021-03-01 Thread GitBox


Dandandan commented on pull request #9596:
URL: https://github.com/apache/arrow/pull/9596#issuecomment-788185255


   Also for memory/performance concerns - I think it makes sense to think about 
those as well. We will do some extra copies by casting, so `u8 + u8` will be 
converted if I understand correctly to `c(u8, u16) + c(u8, u16)`, but `u8 + u8 
+ u8` will involve more copies to something like `c(c(u8, u16)+ c(u8, u16), 
u32) + c(u8, u32)` etc. as the subexpressions are coerced to u16 already first. 
This could be optimized out a bit maybe to avoid the double casting, but it 
shows it can be quite inefficient.
   
   Also when doing IO and keeping the result in memory will result in more 
storage costs, longer latency and/or higher memory usage if you don't convert 
them to use smaller types again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Dandandan commented on pull request #9596: ARROW-11495: [Rust] Better numerical_coercion

2021-03-01 Thread GitBox


Dandandan commented on pull request #9596:
URL: https://github.com/apache/arrow/pull/9596#issuecomment-788175032


   .
   
   > I think this implementation looks really nice -- thank you @sundy-li . I 
believe that the code does what the PR description says it does.
   > 
   > I wonder if you happen to know what Postgres does in this situation (eg 
adding 2 32-bit numbers together?) I can imagine certain situations when the 
users wants to choose performance over avoiding possible overflow and would 
prefer not to upcast both arguments to 64-bit.
   > 
   > @Dandandan / @andygrove what do you think about what the default behavior 
should be?
   
   > I think this implementation looks really nice -- thank you @sundy-li . I 
believe that the code does what the PR description says it does.
   > 
   > I wonder if you happen to know what Postgres does in this situation (eg 
adding 2 32-bit numbers together?) I can imagine certain situations when the 
users wants to choose performance over avoiding possible overflow and would 
prefer not to upcast both arguments to 64-bit.
   > 
   > @Dandandan / @andygrove what do you think about what the default behavior 
should be?
   
   I think it could be acceptable to do a bit more here than PostgreSQL does? 
It think it will make DataFusion behave differently however from PostgreSQL.
   E.g. PostgreSQL returns an error `integer out of range` on `SELECT 
200 + 200` which I think the coercion will avoid by upcasting 
to 64 bit integers.
   
   > I think this implementation looks really nice -- thank you @sundy-li . I 
believe that the code does what the PR description says it does.
   > 
   > I wonder if you happen to know what Postgres does in this situation (eg 
adding 2 32-bit numbers together?) I can imagine certain situations when the 
users wants to choose performance over avoiding possible overflow and would 
prefer not to upcast both arguments to 64-bit.
   > 
   > @Dandandan / @andygrove what do you think about what the default behavior 
should be?
   
   I think it could be acceptable to do a bit more here than PostgreSQL does? 
It think it will make DataFusion behave differently however from PostgreSQL.
   E.g. PostgreSQL returns an error `integer out of range` on `SELECT 
200 + 200` but I think the coercion will avoid this error by 
upcasting to 64 bit integers.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org