Re: [Discuss] Extension types based on canonical extension types?

2024-04-30 Thread Dewey Dunnington
I don't think there is any current barrier to using implementation features of one extension type to help with another. In Python, for example, one might be able to do: class GeoJSONExtensionType(pa.ExtensionType): def __init__(self): self._json_ext = pa.JSONExtensionType() def

Re: [Discuss] Extension types based on canonical extension types?

2024-04-30 Thread Ian Cook
But consider that a user might want to define a non-canonical HLLSKETCH extension type and make use of Arrow implementations' features for handling JSON canonical extension type columns in order to handle HLLSKETCH extension type columns. The spec currently does not provide any means to enable

Re: [Discuss] Extension types based on canonical extension types?

2024-04-30 Thread Weston Pace
I think "inheritance" and "composition" are more concerns for implementations than they are for spec (I could be wrong here). So it seems that it would be sufficient to write the HLLSKETCH's canonical definition as "this is an extension of the JSON logical type and supports all the same storage

Re: [Discuss] Extension types based on canonical extension types?

2024-04-30 Thread Matt Topol
I think the biggest blocker to doing this is the way that we pass extension types through IPC. Extension types are sent as their underlying storage type with metadata key-value pairs of specific keys "ARROW:extension:name" and "ARROW:extension:metadata". Since you can't have multiple values for

[Discuss] Extension types based on canonical extension types?

2024-04-30 Thread Ian Cook
The vote on adding a JSON canonical extension type [1] got me wondering: Is it possible to define an extension type that is based on a canonical extension type? If so, how? For example, say I wanted to define a (non-canonical) HLLSKETCH extension type that corresponds to the type that Redshift