Re: arrow encoding of nested dictionary?

Wes McKinney Tue, 24 Sep 2019 20:59:49 -0700

The Arrow version of a nested structure will use significantly less
memory than the nested-Python-dictionary version.


We don't have a 100% complete converter from JSON-like data to Arrow
in-memory -- the main thing that's missing is creation of Unions
automatically. For example, the array

[700, 800, {'random string53': 900, 'random string54': 'random string55'}]

would need to be a union of an integer and a struct.

Assuming you don't have heterogeneous arrays and the type of values
don't change from record to record, you can simply pass a list of
records to pyarrow.array

- Wes

On Tue, Sep 24, 2019 at 1:26 PM Luke <[email protected]> wrote:
>
> This is a simplified example but trying to figure out what gains can be had 
> using arrow vice straight nested python dictionaries for something like the 
> following:
>
> {'random string 1': {'field1': {'field11': 'random string 2',
>                                 'field12': 100},
>                      'field2': 200,
>                      'field3': [300,
>                                 400,
>                                 {'random string 3': 500}]
>                     },
>  'random string 4': {'field5': {'field51': 600,
>                                 'field52 ': [700,
>                                             800,
>                                             {'random string53': 900,
>                                              'random string54': 'random 
> string55'}
>                                             ]
>                                  }
>                      }
> }
>
> I didn't see anything that would convert an arbitrary nested dictionary into 
> some arrow structure -- did I miss something?  If there isn't what are some 
> suggestions.  I am doing pretty heavy data analysis where I am handed some 
> nested python dictionaries or nested json that I am loading into a nested 
> python dictionary.  The memory footprint on these are large and I have 
> individual json files when loaded by json.load becomes a 5-6 GB python 
> dictionary (which is a little crazy when the actual json files is like 700MB).
>
> curious,
> Luke

Re: arrow encoding of nested dictionary?

Reply via email to