I've moved this to python-ideas where it is more appropriate, as Chris notes
On Thu, Oct 21, 2021, 8:42 PM Chris Angelico <ros...@gmail.com> wrote: > On Fri, Oct 22, 2021 at 3:23 AM David Mertz, Ph.D. > <david.me...@gmail.com> wrote: > > > > On Thu, Oct 21, 2021 at 2:52 AM Steven D'Aprano <st...@pearwood.info> > wrote: > >> > >> On Tue, Oct 19, 2021 at 05:09:42PM -0700, Michael Selik wrote: > >> > None and its ilk often conflate too many qualities. For example, is it > >> > missing because it doesn't exist, it never existed, or because we > never > >> > received a value, despite knowing it must exist? > > > > > >> > >> 30+ years later, and we cannot easily, reliably or portably use NAN > >> payloads. Most people don't care. If we offerred them a dozen or a > >> thousand distinct sentinels for all the various kinds of missing data, > >> how many people would use them and how many would just stick to plain > >> old None? > > > > > > In data science, I have been frustrated by the sparsity of ways of > spelling "missing value." > > Might be worth redirecting this to -ideas. > > > Besides the distinction Michael points out, and that Steven did in > relation to NaNs with payloads, I encounter missingness of various other > sorts as well. Crucially, an important kind of missing data is data where > the value I received seems unreliable and I have decided to *impute* > missingness rather than accept a value I believe is unreliable. > > > > But there is also something akin to what Michael points out (maybe it's > just an example). For example, "middle name" is something that some people > simply do not have, other people choose not to provide on a survey, and > others still we just don't know anything beyond "it's not there." > > > > And some people have more than one (I have a brother with two of > them). Not the best example to use, since names have WAY more > complexities than different types of absence, but there are other > cases where that sort of thing comes up. For instance, if someone says > on a survey that s/he is in Australia, and then you ask for a > postcode, then leaving it blank should be recorded as "chose not to > provide"; but if the country is listed as Timor-Leste / East Timor, > then "not applicable" would be appropriate, since the country doesn't > use postal codes. > > > Of course, when I impute missingness, I can do so at various stages of > data cleaning, and for various different reasons or confidences. None (or > NaN) are sort of OK, but carrying metadata as to the nature of missingness > would be nice. > > > > Right. Using postcodes as an example again, for someone in Australia, > a postcode of "E3B 0H8" doesn't make sense, as that isn't the format > we use. So you could wipe that out and replace it with "No postal > code, malformed data entered". > > > So my strawman suggestion is tagging None's. I suppose spellings like > `None[reason]` or `None(reason)` are appealing. > > > > An obvious problem that I recognize is that it's not obvious this can > "play nice" with the common idiom `if mydata is not None: ...`. None > really is a singleton, and a "tagged singleton" or "annotated singleton" > probably doesn't work well with Python's object model. > > > > My goal, of course, would be to have TaggedNone be a kind of subclass of > None, in the same way that bool is a subclass of int, and hence True is a > kind of 1. However, I'd want a large number of custom None's, with some > sort of accessible string or numeric code or something to inspect which one > it was. > > > > But this is where I start to disagree. None should remain a singleton, > but "no data available" could be its own thing, tied in with the way > that you do your data storage and stats. As such, you wouldn't be > checking it with 'is', so you wouldn't have that problem (the Python > 'is' operator will only ever test for actual object identity). > > Keep None simple and dependable, and then "Missing Data" can be an > entire class of values if you so desire. > > ChrisA > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/6NY5NQCJR3ROFBWWFOVD47HJFBQJC3IZ/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CFNI2QLBJ5D3YOQZ2TSZHZHQCPXCGAUN/ Code of Conduct: http://python.org/psf/codeofconduct/