Thanks Wes for the explanation, I was missing the need for the union. I was pretty amazed at how much more memory the python nested dict was than the size of the json file on disk, especially with how verbose json is.
-Luke On Tue, Sep 24, 2019 at 11:59 PM Wes McKinney <[email protected]> wrote: > The Arrow version of a nested structure will use significantly less > memory than the nested-Python-dictionary version. > > We don't have a 100% complete converter from JSON-like data to Arrow > in-memory -- the main thing that's missing is creation of Unions > automatically. For example, the array > > [700, 800, {'random string53': 900, 'random string54': 'random string55'}] > > would need to be a union of an integer and a struct. > > Assuming you don't have heterogeneous arrays and the type of values > don't change from record to record, you can simply pass a list of > records to pyarrow.array > > - Wes > > On Tue, Sep 24, 2019 at 1:26 PM Luke <[email protected]> wrote: > > > > This is a simplified example but trying to figure out what gains can be > had using arrow vice straight nested python dictionaries for something like > the following: > > > > {'random string 1': {'field1': {'field11': 'random string 2', > > 'field12': 100}, > > 'field2': 200, > > 'field3': [300, > > 400, > > {'random string 3': 500}] > > }, > > 'random string 4': {'field5': {'field51': 600, > > 'field52 ': [700, > > 800, > > {'random string53': 900, > > 'random string54': 'random > string55'} > > ] > > } > > } > > } > > > > I didn't see anything that would convert an arbitrary nested dictionary > into some arrow structure -- did I miss something? If there isn't what are > some suggestions. I am doing pretty heavy data analysis where I am handed > some nested python dictionaries or nested json that I am loading into a > nested python dictionary. The memory footprint on these are large and I > have individual json files when loaded by json.load becomes a 5-6 GB python > dictionary (which is a little crazy when the actual json files is like > 700MB). > > > > curious, > > Luke >
