[DISCUSS][Arrow] Extension metadata encoding design

2023-08-16 Thread Jeremy Leibs
Hello, I've recently started working with extension types as part of our project and I was surprised to discover that extension types are required to pack all of their own metadata into a single string value of the "ARROW:extension:metadata" key. In turn this then means we have to endure arbitrar

Re: [DISCUSS][Arrow] Extension metadata encoding design

2023-08-16 Thread Jeremy Leibs
ss > (such as UUID, JSON, BSON...). > > It does imply that extension types with sophisticated parameterization > must implement a custom (de)serialization mechanism themselves. I'm not > sure this tradeoff was discussed at the time, perhaps other people (Wes? > Jacques?) may

Re: [DISCUSS][Arrow] Extension metadata encoding design

2023-08-16 Thread Jeremy Leibs
, we cannot easily change > this anymore. > > Regards > > Antoine. > > > > Le 16/08/2023 à 17:48, Jeremy Leibs a écrit : > > Thanks for the context, Antoine. > > > > However, even in those examples, I don't really see how coercing the > &

Re: [DISCUSS] Proposal to add VariableShapeTensor Canonical Extension Type

2023-09-13 Thread Jeremy Leibs
On Wed, Sep 13, 2023 at 8:38 AM Antoine Pitrou wrote: > > Le 13/09/2023 à 02:37, Rok Mihevc a écrit : > > > >* **ragged_dimensions** = indices of ragged dimensions whose sizes may > > differ. Dimensions where all elements have the same size are called > > uniform dimensions. Indices

Re: [DISCUSS] Proposal to add VariableShapeTensor Canonical Extension Type

2023-09-13 Thread Jeremy Leibs
Additionally, after reviewing, I also think the introduction of permutations requires a bit more clarification. Please consider adding some wording and an example such as: With the exception of the permutation parameter, all other lists and storage within the Tensor and the extension parameters d

Re: [DISCUSS] Proposal to add VariableShapeTensor Canonical Extension Type

2023-09-15 Thread Jeremy Leibs
On Fri, Sep 15, 2023 at 8:32 PM Rok Mihevc wrote: > > How about also changing shape and adding uniform_shape like so: > """ > **shape** is a ``FixedSizeList[ndim_ragged]`` of ragged shape > of each tensor contained in ``data`` where the size of the list > ``ndim_ragged`` is equal to the number of

[DISCUSS] Approach to generic schema representation

2024-07-08 Thread Jeremy Leibs
I'm looking for any advice folks may have on a generic way to document and represent expected arrow schemas as part of an interface definition. For context, our library provides a cross-language (python, c++, rust) SDK for logging semantic multi-modal data (point clouds, images, geometric transfor

Re: [DISCUSS] Approach to generic schema representation

2024-07-08 Thread Jeremy Leibs
st message of an IPC stream is a Schema message which > > consists solely of a flatbuffer message and defined in the Schema.fbs > file > > of the Arrow repo. All of the libraries that can read Arrow IPC should be > > able to also handle converting a single IPC schema message bac

Re: [DISCUSS] Approach to generic schema representation

2024-07-08 Thread Jeremy Leibs
the same language (e.g. rust->rust) since it allows the different > > libraries to use different arrow versions. > > > > There is one other approach if you only need intra-process serialization > > (e.g. between threads / libraries in the same process). You can use the

Re: [DISCUSS] Approach to generic schema representation

2024-07-08 Thread Jeremy Leibs
; > > > -Original Message- > > > From: Weston Pace > > > Sent: Monday, July 8, 2024 9:43 AM > > > To: dev@arrow.apache.org > > > Subject: Re: [DISCUSS] Approach to generic schema representation > > > > > > External Email: Use ca