GitHub user paleolimbot added a comment to the discussion: [C++] Supporting 
compute functions on ExtensionTypes

> in principle, you can remove the metadata and treat the column as its 
> physical type

Just a note that implicit casting to storage only makes sense for some 
extension types (although it's appropriate for many, like JSON). Something that 
DuckDB does which is quite nice with respect to its extension types ("aliases") 
is the ability to register a cast implementation between two types (which 
includes the option of whether it is implicit or not). That said the implicit 
cast to storage is not the end of the world (just allows non-sensical 
operations to occur that might be clearer as an error, like multiplying an 
S2_CELL identifier stored as a uint64 by something, since the result is 
meaningless).

> modifying data (e.g. appending strings) inevitably strips extension types

I think this is usually the desired behaviour (i.e., a substring of a JSON item 
is no longer necessarily JSON?)

My low-priority personal wishlist for extension type functionality in Arrow 
C++/pyarrow based on my experience in geoarrow-pyarrow would be:

- Ability to register a cast to string (so that my geometry ChunkedArrays and 
tables are printed more nicely!)
- A compute function to strip extensions (that also works on things that aren't 
extensions). This is sort of an opt-in version of the implicit cast to storage.
- Ability to register a type2 function (like `vctrs::vec_ptype2()`) and a cast 
function (like `vctrs::vec_cast()`) to support concatenating extension type 
arrays that don't have identical storage. Variant will probably need this to be 
able to handle shredded and unshredded versions in the same `Dataset`.

GitHub link: 
https://github.com/apache/arrow/discussions/46671#discussioncomment-13370914

----
This is an automatically sent email for user@arrow.apache.org.
To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org

Reply via email to