Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]
alamb commented on issue #16248: URL: https://github.com/apache/datafusion/issues/16248#issuecomment-3180838120 Here is another list of potential subtrait issues - https://github.com/apache/datafusion/issues/17159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]
gabotechs commented on issue #16248: URL: https://github.com/apache/datafusion/issues/16248#issuecomment-3001436651 I imagine that in the same what you were able to encode the extra information about the types in https://github.com/apache/datafusion/pull/16503 as different variations of the same type, there could be an opportunity for doing the same thing with this other types. I'm not sure how well this approach could fit for a `Float16` though... Having Substrait's Fp32 type have a type variation that downgrades to Fp16 seems a bit weird when the type name is explicitly stating that it's a 32 bit floating point number. I've asked the question here https://github.com/substrait-io/substrait/issues/822, and my intuition tells me that the answer to that question could also set the path forward for other types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]
jkosh44 commented on issue #16248: URL: https://github.com/apache/datafusion/issues/16248#issuecomment-2952747673 Most of the cast errors have the same cause, they are trying to cast a type from Arrow that doesn't exist in substrait. - https://github.com/apache/datafusion/issues/16275 - https://github.com/apache/datafusion/issues/16278 - https://github.com/apache/datafusion/issues/16285 - https://github.com/apache/datafusion/issues/16296 - https://github.com/apache/datafusion/issues/16298 It's probably worth coming up for a solution for all of them together, instead of independently. Has it been previously discussed how we should handle these kind of types? I left some comments in https://github.com/apache/datafusion/issues/16285, but here's a TLDR: - One option is to use UDTs in substrait to represent these. That's what the arrow CPP library does (https://github.com/apache/arrow/issues/40695). The downside of this approach is that external systems are unlikely to understand the UDT. - Another approach is to do a lossy conversion to a similar type. Duration -> IntervalDay, Float16 -> Fp32, FixedSizeList -> List, Time32 -> Time, Time64 -> Time. If I understand correctly, these conversions are lossy because they lose some semantic information, but we don't have to lower the precision of any value. They won't however round trip from arrow -> substrait -> arrow, so we'll probably need to update the roundtrip tests to account for this. - Continue to return an error, because these types are not supported in substrait and update the substrait roundtrip tests to ignore these types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
