To my knowledge, "None" has always been the preferred null sentinel
value for object-dtype arrays in pandas, but since sometimes these
arrays originate from transposes or other join/append operations that
merge numeric arrays (which have NaN sentinels) into non-numeric
arrays to create object array
That won't help in this specific case, since it is for an array of
strings (which you can't fill with NaN), and for floating point
arrays, we already use np.nan as "null" representation when converting
to numpy/pandas.
On Wed, 9 Jun 2021 at 03:37, Benjamin Kietzman wrote:
>
> As a workaround, the
As a workaround, the "fill_null" compute function can be used to replace
nulls with nans:
>>> nan = pa.scalar(np.NaN, type=pa.float64())
>>> pa.Array.from_pandas(s).fill_null(nan).to_pandas()
On Tue, Jun 8, 2021, 16:15 Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:
> Hi Li,
>
> It'
Hi Li,
It's correct that arrow uses "None" for null values when converting a
string array to numpy / pandas.
As far as I am aware, there is currently no option to control that
(and to make it use np.nan instead), and I am not sure there would be
much interest in adding such an option.
Now, I know
Semantically, a NaN is defined according to the IEEE_754 for floating
points, while a null represents any value whose value is undefined,
unknown, etc.
An important set of problems that arrow solves is that it has a native
representation for null values (independent of NaNs): arrow's in-memory
mod