But how do social scientists do research into racism without including 
ethnicity as a feature in the data?

Best wishes
Andrew

Public Profile


> On 6 Jul 2017, at 17:05, G Reina <gre...@eng.ucsd.edu> wrote:
> 
> I'd like to request that the "Boston Housing Prices" dataset in sklearn 
> (sklearn.datasets.load_boston) be replaced with the "Ames Housing Prices" 
> dataset (https://ww2.amstat.org/publications/jse/v19n3/decock.pdf 
> <https://ww2.amstat.org/publications/jse/v19n3/decock.pdf>). I am willing to 
> submit the code change if the developers agree.
> 
> The Boston dataset has the feature "Bk is the proportion of blacks in town". 
> It is an incredibly racist "feature" to include in any dataset. I think is 
> beneath us as data scientists.
> 
> I submit that the Ames dataset is a viable alternative for learning 
> regression. The author has shown that the dataset is a more robust 
> replacement for Boston. Ames is a 2011 regression dataset on housing prices 
> and has more than 5 times the amount of training examples with over 7 times 
> as many features (none of which are morally questionable). 
> 
> I welcome the community's thoughts on the matter.
> 
> Thanks.
> -Tony
> 
> Here's an article I wrote on the Boston dataset:
> https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D
>  
> <https://www.linkedin.com/pulse/hidden-racism-data-science-g-anthony-reina?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bmu67f2GSzj5xHMpSD6M00A%3D%3D>
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to