hi Partha,

I believe you have mixed up struct and map types. When you pass a
list-of-pydicts to Arrow, it infers a struct type for the dicts by
default, which means that all of the observed keys will be represented
in every entry (with null values if they are not present), so here
it's something like list<struct<CORE: struct< '0': string, '1':
string, '2': string>>>.

If you want a map type (where each dict has different entries), you
have to write down the map type you want explicitly and pass that when
constructing the Arrow array object. What you want is
list<struct<CORE: map<string, string>>> (I think)

- Wes


On Fri, Jan 29, 2021 at 9:23 AM PARTHA DUTTA <[email protected]> wrote:
>
> I may be doing something wrong here, so any help would be greatly 
> appreciated. I am trying to store a nested python dict into an Arrow table, 
> and I am getting some unexpected results. This is sample code:
>
> import copy
> import pyarrow as pa
> import random
>
> def test_it():
>     arr = []
>     for f in range(5):
>         num_maps = random.randrange(4) + 1
>         print("Number of maps = {}".format(num_maps))
>         mdict = {}
>         mdict["CORE"] = {}
>         for r in range(num_maps):
>             mdict["CORE"][str(r)] = {"status": "realized"}
>         arr.append(copy.deepcopy(mdict))
>     tbl = pa.Table.from_pydict({"_map": arr})
>     print(tbl.to_pydict())
>
> test_it()
>
>
> This is the output of the code:
>
> Number of maps = 1
> Number of maps = 1
> Number of maps = 2
> Number of maps = 3
> Number of maps = 2
> {'_map': [{'CORE': {'0': {'status': 'realized'}, '1': None, '2': None}}, 
> {'CORE': {'0': {'status': 'realized'}, '1': None, '2': None}}, {'CORE': {'0': 
> {'status': 'realized'}, '1': {'status': 'realized'}, '2': None}}, {'CORE': 
> {'0': {'status': 'realized'}, '1': {'status': 'realized'}, '2': {'status': 
> 'realized'}}}, {'CORE': {'0': {'status': 'realized'}, '1': {'status': 
> 'realized'}, '2': None}}]}
>
> It seems that when the table is created, it is filling in empty dict values 
> such that the number of elements is completely equal. This is not what I 
> wanted, and I am wondering if this is a feature, or am I missing something 
> such that my intended output would not contain "null" vales.
>
> Thanks,
> Partha
> --
> Partha Dutta
> [email protected]

Reply via email to