In my opinion, I think we shouldn’t strive for a “general purpose” parser. The problem is that websites / data repositories are simply not consistent enough. Also, it’s not solely the maintainers fault that things are not consistent: Different data formats (csv, sql, hdf5, json, …) have been invented for a reason; Sometimes just makes more sense to store a particular dataset in one or the other format.
So, instead of aiming for a general purpose parser, I’d be a better idea to write a specific function or method for converting a particular dataset from a particular source into a sckit-learn compatible format and accounting for potential glitches. However, I also think that this is way beyond the scope of scikit-learn, but it sounds like an interesting idea for a side-project like "scikit-datasets" or so. Best, Sebastian > On Dec 10, 2015, at 4:06 AM, federico vaggi <vaggi.feder...@gmail.com> wrote: > > There was a similar effort here: > https://groups.google.com/forum/#!searchin/keras-users/datasets/keras-users/n6jE9eFcaYI/Roo-rWK6CQAJ > - where someone wrote a small library to abstract the loading of open source > datasets. While having extra dependencies is something that should probably > be avoided, I don't think it makes a lot of sense to build a lot of code into > scikit-learn to load and fetch datasets (except the truly common ones like > newsgroups/iris, etc). > > I think with the proliferation of new deep learning libraries that are > cropping up, it would be good to agree to a common format to load/store > different datasets without duplicating the effort in many places. > > On Wed, 9 Dec 2015 at 19:58 Andreas Mueller <t3k...@gmail.com> wrote: > > > On 12/09/2015 01:48 PM, Gael Varoquaux wrote: > > On Wed, Dec 09, 2015 at 12:33:55PM -0500, Andreas Mueller wrote: > >> I guess we use the matlab data with is not required by mldata. > >> We could add code that tries to fetch the matlab, and if that doesn't > >> work uses the hdf5, > > I'd rather not. I'd rather we just have a good error message. > > > Never mind, I don't think this was the problem here anyhow. > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general