Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-09 Thread Wes McKinney
To my knowledge, "None" has always been the preferred null sentinel value for object-dtype arrays in pandas, but since sometimes these arrays originate from transposes or other join/append operations that merge numeric arrays (which have NaN sentinels) into non-numeric arrays to create object array

Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-09 Thread Joris Van den Bossche
That won't help in this specific case, since it is for an array of strings (which you can't fill with NaN), and for floating point arrays, we already use np.nan as "null" representation when converting to numpy/pandas. On Wed, 9 Jun 2021 at 03:37, Benjamin Kietzman wrote: > > As a workaround, the

Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-08 Thread Benjamin Kietzman
As a workaround, the "fill_null" compute function can be used to replace nulls with nans: >>> nan = pa.scalar(np.NaN, type=pa.float64()) >>> pa.Array.from_pandas(s).fill_null(nan).to_pandas() On Tue, Jun 8, 2021, 16:15 Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Hi Li, > > It'

Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-08 Thread Joris Van den Bossche
Hi Li, It's correct that arrow uses "None" for null values when converting a string array to numpy / pandas. As far as I am aware, there is currently no option to control that (and to make it use np.nan instead), and I am not sure there would be much interest in adding such an option. Now, I know

Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-08 Thread Jorge Cardoso Leitão
Semantically, a NaN is defined according to the IEEE_754 for floating points, while a null represents any value whose value is undefined, unknown, etc. An important set of problems that arrow solves is that it has a native representation for null values (independent of NaNs): arrow's in-memory mod