I work in the financial services industry and build machine learning
models for marketing applications. We put an enormous effort (multiple
layers of oversight and governance) into ensuring that our models are
free of bias against protected classes etc. Having data describing race
and ethnicity (among others) is extremely important to validate this is
indeed the case. Without it, you have no such assurance.
On 07/06/2017 12:19 PM, Andrew Holmes wrote:
But how do social scientists do research into racism without including
ethnicity as a feature in the data?
Best wishes
Andrew
Public Profile
On 6 Jul 2017, at 17:05, G Reina <gre...@eng.ucsd.edu
<mailto:gre...@eng.ucsd.edu>> wrote:
I'd like to request that the "Boston Housing Prices" dataset in
sklearn (sklearn.datasets.load_boston) be replaced with the "Ames
Housing Prices" dataset
(https://ww2.amstat.org/publications/jse/v19n3/decock.pdf). I am
willing to submit the code change if the developers agree.
The Boston dataset has the feature "Bk is the proportion of blacks in
town". It is an incredibly racist "feature" to include in any
dataset. I think is beneath us as data scientists.
I submit that the Ames dataset is a viable alternative for learning
regression. The author has shown that the dataset is a more robust
replacement for Boston. Ames is a 2011 regression dataset on housing
prices and has more than 5 times the amount of training examples with
over 7 times as many features (none of which are morally questionable).
I welcome the community's thoughts on the matter.
Thanks.
-Tony
Here's an article I wrote on the Boston dataset:
https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn