Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]

2025-08-12 Thread via GitHub


alamb commented on issue #16248:
URL: https://github.com/apache/datafusion/issues/16248#issuecomment-3180838120

   Here is another list of potential subtrait issues
- https://github.com/apache/datafusion/issues/17159


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]

2025-06-24 Thread via GitHub


gabotechs commented on issue #16248:
URL: https://github.com/apache/datafusion/issues/16248#issuecomment-3001436651

   I imagine that in the same what you were able to encode the extra 
information about the types in https://github.com/apache/datafusion/pull/16503 
as different variations of the same type, there could be an opportunity for 
doing the same thing with this other types.
   
   I'm not sure how well this approach could fit for a `Float16` though... 
Having Substrait's Fp32 type have a type variation that downgrades to Fp16 
seems a bit weird when the type name is explicitly stating that it's a 32 bit 
floating point number.
   
   I've asked the question here 
https://github.com/substrait-io/substrait/issues/822, and my intuition tells me 
that the answer to that question could also set the path forward for other 
types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]

2025-06-07 Thread via GitHub


jkosh44 commented on issue #16248:
URL: https://github.com/apache/datafusion/issues/16248#issuecomment-2952747673

   Most of the cast errors have the same cause, they are trying to cast a type 
from Arrow that doesn't exist in substrait.
   
   - https://github.com/apache/datafusion/issues/16275
   - https://github.com/apache/datafusion/issues/16278
   - https://github.com/apache/datafusion/issues/16285
   - https://github.com/apache/datafusion/issues/16296
   - https://github.com/apache/datafusion/issues/16298
   
   It's probably worth coming up for a solution for all of them together, 
instead of independently. Has it been previously discussed how we should handle 
these kind of types?
   
   I left some comments in https://github.com/apache/datafusion/issues/16285, 
but here's a TLDR: 
   
   - One option is to use UDTs in substrait to represent these. That's what the 
arrow CPP library does (https://github.com/apache/arrow/issues/40695). The 
downside of this approach is that external systems are unlikely to understand 
the UDT.
   - Another approach is to do a lossy conversion to a similar type. Duration 
-> IntervalDay, Float16 -> Fp32, FixedSizeList -> List, Time32 -> Time, Time64 
-> Time. If I understand correctly, these conversions are lossy because they 
lose some semantic information, but we don't have to lower the precision of any 
value. They won't however round trip from arrow -> substrait -> arrow, so we'll 
probably need to update the roundtrip tests to account for this.
   - Continue to return an error, because these types are not supported in 
substrait and update the substrait roundtrip tests to ignore these types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]