Maybe this?

import pyarrow as pa
import pyarrow.compute as pc
import pandas as pd

df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
                   'n_legs': [2, 4, 5, 100],
                   'animals': ["Flamingo", "Horse", "Brittle stars", None]})
table = pa.Table.from_pandas(df)

old_animals = table.column("animals")  # similar to df.loc[:, 'animals']

new_animals = pc.fill_null(old_animals, "new_value")
table2 = table.set_column(2, "animals", new_animals)


I'm not entirely sure whether this solution makes a copy of the original
table or not. If so, you may want to drop the old column and add a new
column rather than creating table2.

I based this on this:
https://arrow.apache.org/cookbook/py/data.html#replacing-a-column-in-an-existing-table



On Mon, Jul 4, 2022 at 5:14 PM H G <[email protected]> wrote:

> Thanks for the input. Filtering is possible to get the null value using
> table.filter(table['animals'].is_null())
>
> However, I am struggling to set value to this filter. Any suggestions?
>
> On Mon, 4 Jul 2022 at 16:45, Michael <[email protected]>
> wrote:
>
>> This section of the cookbook might help:
>>
>> https://arrow.apache.org/cookbook/py/data.html#filtering-arrays-using-a-mask
>>
>> Also these methods in the compute module.
>>
>>
>> https://arrow.apache.org/docs/python/api/compute.html#selecting-multiplexing
>> https://arrow.apache.org/docs/python/api/compute.html#selections
>>
>> Not at my computer, so apologies for not giving a direct example. I think
>> coalesce might be the method you need.
>>
>>
>> On Mon, Jul 4, 2022 at 12:44 PM H G <[email protected]> wrote:
>>
>>> iloc equivalent for selection by position and setting values?
>>>
>>> import pyarrow as pa
>>> import pandas as pd
>>> df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
>>>                    'n_legs': [2, 4, 5, 100],
>>>                    'animals': ["Flamingo", "Horse", "Brittle stars",
>>> None]})
>>> table = pa.Table.from_pandas(df)
>>>
>>> df.loc[df["animals"].isnull(), "animals"] = "new_value" # how do we
>>> perform this in pyarrow?
>>>
>>> I did open this on github, but I assume it is not the forum for queries.
>>>
>>> Thanks
>>>
>> --
>>
>> Michael
>>
>

Reply via email to