But how do social scientists do research into racism without including ethnicity as a feature in the data?
Best wishes Andrew Public Profile > On 6 Jul 2017, at 17:05, G Reina <gre...@eng.ucsd.edu> wrote: > > I'd like to request that the "Boston Housing Prices" dataset in sklearn > (sklearn.datasets.load_boston) be replaced with the "Ames Housing Prices" > dataset (https://ww2.amstat.org/publications/jse/v19n3/decock.pdf > <https://ww2.amstat.org/publications/jse/v19n3/decock.pdf>). I am willing to > submit the code change if the developers agree. > > The Boston dataset has the feature "Bk is the proportion of blacks in town". > It is an incredibly racist "feature" to include in any dataset. I think is > beneath us as data scientists. > > I submit that the Ames dataset is a viable alternative for learning > regression. The author has shown that the dataset is a more robust > replacement for Boston. Ames is a 2011 regression dataset on housing prices > and has more than 5 times the amount of training examples with over 7 times > as many features (none of which are morally questionable). > > I welcome the community's thoughts on the matter. > > Thanks. > -Tony > > Here's an article I wrote on the Boston dataset: > https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D > > <https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn