Ok. That actually aligns closely to what I'm familiar with. Good to know. Thanks again for taking the time to respond,
-Dan Nugent On Mon, Mar 30, 2020 at 12:38 PM Wes McKinney <[email protected]> wrote: > Social and technical reasons I guess. Empirically it's just not used much. > > You can see my comments about numpy.ma in my 2010 paper about pandas > > https://conference.scipy.org/proceedings/scipy2010/pdfs/mckinney.pdf > > At least in 2010, there were notable performance problems when using > MaskedArray for computations > > "We chose to use NaN as opposed to using NumPy MaskedArrays for > performance reasons (which are beyond the scope of this paper), as NaN > propagates in floating-point operations in a natural way and can be > easily detected in algorithms." > > On Mon, Mar 30, 2020 at 11:20 AM Daniel Nugent <[email protected]> wrote: > > > > Thanks! Since I'm just using it to jump to Arrow, I think I'll stick > with it. > > > > Do you have any feelings about why Numpy's masked arrays didn't gain > favor when many data representation formats explicitly support nullity > (including Arrow)? Is it just that not carrying nulls in computations > forward is preferable (that is, early filtering/value filling was easier)? > > > > -Dan Nugent > > > > > > On Mon, Mar 30, 2020 at 11:40 AM Wes McKinney <[email protected]> > wrote: > >> > >> On Mon, Mar 30, 2020 at 8:31 AM Daniel Nugent <[email protected]> wrote: > >> > > >> > Didn’t want to follow up on this on the Jira issue earlier since it's > sort of tangential to that bug and more of a usage question. You said: > >> > > >> > > I wouldn't recommend building applications based on them nowadays > since the level of support / compatibility in other projects is low. > >> > > >> > In my case, I am using them since it seemed like a straightforward > representation of my data that has nulls, the format I’m converting from > has zero cost numpy representations, and converting from an internal format > into Arrow in memory structures appears zero cost (or close to it) as well. > I guess I can just provide the mask as an explicit argument, but my > original desire to use it came from being able to exploit > numpy.ma.concatenate in a way that saved some complexity in implementation. > >> > > >> > Since Arrow itself supports masking values with a bitfield, is there > something intrinsic to the notion of array masks that is not well > supported? Or do you just mean the specific numpy MaskedArray class? > >> > > >> > >> I mean just the numpy.ma module. Not many Python computing projects > >> nowadays treat MaskedArray objects as first class citizens. Depending > >> on what you need it may or may not be a problem. pyarrow supports > >> ingesting from MaskedArray as a convenience, but it would not be > >> common in my experience for a library's APIs to return MaskedArrays. > >> > >> > If this is too much of a numpy question rather than an arrow > question, could you point me to where I can read up on masked array support > or maybe what the right place to ask the numpy community about whether what > I'm doing is appropriate or not. > >> > > >> > Thanks, > >> > > >> > > >> > -Dan Nugent >
