You could use a struct with two fields: a string key, and a string value
(or union-typed value if you want multiple legal types).

That’s what the pyarrow MapArray type is, under the hood; if you’re in
Python then using map would be simple and convenient.

On Thu, May 18, 2023 at 5:07 PM Robert McLeod <[email protected]> wrote:

> What is considered best practice for storage of fairly large quantities of
> meta-data within Arrow? For the sake of simplicity I'll define metadata as
> key-value pairs, in this case stored in a 2-column SQL table, but it
> equally could be
> ZjQcmQRYFpfptBannerStart
> This Message Is From an Untrusted Sender
> You have not previously corresponded with this sender.
> See https://itconnect.uw.edu/email-tags for additional information.
> Please contact the UW-IT Service Center, [email protected] 206.221.5000, for
> assistance.
>
> ZjQcmQRYFpfptBannerEnd
> What is considered best practice for storage of fairly large quantities of
> meta-data within Arrow? For the sake of simplicity I'll define metadata as
> key-value pairs, in this case stored in a 2-column SQL table, but it
> equally could be a Python dict or similar.
>
> Arrow has a dictionary class but it doesn't appear to permit the type of
> the value to be mutable. Since it seems to be some sort of std map I
> discounted it as a solution, since I have int, float, and string values.
>
> My current thought would be to serialize the data to JSON and write it
> into a `ListArray` with bytes type, with one element in the ListArray for
> every JSON blob I have. It feels a bit kludge but it does work. Is this a
> reasonable approach, or is there a better solution?
>
> Robert
>
>
> --
> Robert McLeod
> [email protected]
> [email protected]
>
>

Reply via email to