GitHub user paleolimbot added a comment to the discussion: [C++] Supporting compute functions on ExtensionTypes
> in principle, you can remove the metadata and treat the column as its > physical type Just a note that implicit casting to storage only makes sense for some extension types (although it's appropriate for many, like JSON). Something that DuckDB does which is quite nice with respect to its extension types ("aliases") is the ability to register a cast implementation between two types (which includes the option of whether it is implicit or not). That said the implicit cast to storage is not the end of the world (just allows non-sensical operations to occur that might be clearer as an error, like multiplying an S2_CELL identifier stored as a uint64 by something, since the result is meaningless). > modifying data (e.g. appending strings) inevitably strips extension types I think this is usually the desired behaviour (i.e., a substring of a JSON item is no longer necessarily JSON?) My low-priority personal wishlist for extension type functionality in Arrow C++/pyarrow based on my experience in geoarrow-pyarrow would be: - Ability to register a cast to string (so that my geometry ChunkedArrays and tables are printed more nicely!) - A compute function to strip extensions (that also works on things that aren't extensions). This is sort of an opt-in version of the implicit cast to storage. - Ability to register a type2 function (like `vctrs::vec_ptype2()`) and a cast function (like `vctrs::vec_cast()`) to support concatenating extension type arrays that don't have identical storage. Variant will probably need this to be able to handle shredded and unshredded versions in the same `Dataset`. GitHub link: https://github.com/apache/arrow/discussions/46671#discussioncomment-13370914 ---- This is an automatically sent email for user@arrow.apache.org. To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org