You could use a struct with two fields: a string key, and a string value (or union-typed value if you want multiple legal types).
That’s what the pyarrow MapArray type is, under the hood; if you’re in Python then using map would be simple and convenient. On Thu, May 18, 2023 at 5:07 PM Robert McLeod <[email protected]> wrote: > What is considered best practice for storage of fairly large quantities of > meta-data within Arrow? For the sake of simplicity I'll define metadata as > key-value pairs, in this case stored in a 2-column SQL table, but it > equally could be > ZjQcmQRYFpfptBannerStart > This Message Is From an Untrusted Sender > You have not previously corresponded with this sender. > See https://itconnect.uw.edu/email-tags for additional information. > Please contact the UW-IT Service Center, [email protected] 206.221.5000, for > assistance. > > ZjQcmQRYFpfptBannerEnd > What is considered best practice for storage of fairly large quantities of > meta-data within Arrow? For the sake of simplicity I'll define metadata as > key-value pairs, in this case stored in a 2-column SQL table, but it > equally could be a Python dict or similar. > > Arrow has a dictionary class but it doesn't appear to permit the type of > the value to be mutable. Since it seems to be some sort of std map I > discounted it as a solution, since I have int, float, and string values. > > My current thought would be to serialize the data to JSON and write it > into a `ListArray` with bytes type, with one element in the ListArray for > every JSON blob I have. It feels a bit kludge but it does work. Is this a > reasonable approach, or is there a better solution? > > Robert > > > -- > Robert McLeod > [email protected] > [email protected] > >
