Hi Tony.
I don't think it's a good idea to remove the dataset, given how many
tutorials and examples rely on it.
I also don't think it's a good idea to ignore racial discrimination,
which I guess this feature is trying to capture.
I was recently asked to remove an excerpt from a dataset from my slide,
as it was "too racist". It was randomly sampled
data from the adult census dataset. Unfortunately, economics in the US
are not color blind (yet), and the reality is racist.
I haven't done an in-depth analysis on whether this feature is actually
informative, but I don't think your analysis is conclusive.
Including ethnicity in data actually allows us to ensure "fairness" in
certain decision making processes.
Without collecting this data, it would be impossible to ensure automatic
decisions are not influenced
by past human biases. Arguably that's not what the authors of this
dataset are doing.
Check out http://www.fatml.org/ for more on fairness in machine learning
and data science.
Cheers,
Andy
On 07/06/2017 12:05 PM, G Reina wrote:
I'd like to request that the "Boston Housing Prices" dataset in
sklearn (sklearn.datasets.load_boston) be replaced with the "Ames
Housing Prices" dataset
(https://ww2.amstat.org/publications/jse/v19n3/decock.pdf). I am
willing to submit the code change if the developers agree.
The Boston dataset has the feature "Bk is the proportion of blacks in
town". It is an incredibly racist "feature" to include in any dataset.
I think is beneath us as data scientists.
I submit that the Ames dataset is a viable alternative for learning
regression. The author has shown that the dataset is a more robust
replacement for Boston. Ames is a 2011 regression dataset on housing
prices and has more than 5 times the amount of training examples with
over 7 times as many features (none of which are morally questionable).
I welcome the community's thoughts on the matter.
Thanks.
-Tony
Here's an article I wrote on the Boston dataset:
https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn