[scikit-learn] About the Boston housing prices dataset

2020-10-13 Thread Olivier Grisel
Hi all, Thanks to the sustained effort of several contributors (thanks Maria and Lucy in particular), the Boston housing price dataset is no longer used in the examples of scikit-learn (nor in the test suite) in the master branch. To give some context on why this dataset is problematic, please ha

Re: [scikit-learn] About the Boston housing prices dataset

2020-10-13 Thread Juan Nunez-Iglesias
I very much like your paragraph, Olivier. I might recommend additionally raising it as a warning when calling the data creation function. For reference, in scikit-image when we removed Lena we raised a warning and returned an alternative (the now-famous `data.astronaut()`) for two versions, bef

Re: [scikit-learn] About the Boston housing prices dataset

2020-10-13 Thread Olivier Grisel
Thanks for your input, this is also an extension I was thinking of. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] About the Boston housing prices dataset

2020-10-13 Thread Adrin
Isn't the Boston dataset available through openml? Maybe here: https://www.openml.org/d/531 I'm happy to have the dataset out there on opemml, and for any material that addresses some of the issues with it. But for educational purposes, we don't need to have the dataset in the package as long as u