Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-10 Thread Gael Varoquaux
https://jaberg.github.io/skdata/ On Thu, Dec 10, 2015 at 02:12:19PM -0500, Sebastian Raschka wrote: > In my opinion, I think we shouldn’t strive for a “general purpose” parser. > The problem is that websites / data repositories are simply not consistent > enough. Also, it’s not solely the mainta

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-10 Thread Sebastian Raschka
In my opinion, I think we shouldn’t strive for a “general purpose” parser. The problem is that websites / data repositories are simply not consistent enough. Also, it’s not solely the maintainers fault that things are not consistent: Different data formats (csv, sql, hdf5, json, …) have been inv

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-10 Thread federico vaggi
There was a similar effort here: https://groups.google.com/forum/#!searchin/keras-users/datasets/keras-users/n6jE9eFcaYI/Roo-rWK6CQAJ - where someone wrote a small library to abstract the loading of open source datasets. While having extra dependencies is something that should probably be avoided,

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Andreas Mueller
On 12/09/2015 01:48 PM, Gael Varoquaux wrote: > On Wed, Dec 09, 2015 at 12:33:55PM -0500, Andreas Mueller wrote: >> I guess we use the matlab data with is not required by mldata. >> We could add code that tries to fetch the matlab, and if that doesn't >> work uses the hdf5, > I'd rather not. I'd

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Gael Varoquaux
On Wed, Dec 09, 2015 at 06:17:02PM +, Luca Puggini wrote: > I would really like to have an easy way to import public datasets.  So would I. But is scikit-learn the right place to solve this problem? Gaël -- _

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Gael Varoquaux
On Wed, Dec 09, 2015 at 12:33:55PM -0500, Andreas Mueller wrote: > I guess we use the matlab data with is not required by mldata. > We could add code that tries to fetch the matlab, and if that doesn't > work uses the hdf5, I'd rather not. I'd rather we just have a good error message. G ---

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Andreas Mueller
Actually, this data file is not even in hdfs format (maybe if it was, the matlab would be created automatically?) It is just an upload of a csv file (without file ending) in a zip file. This is not really a supported format for mldata. How could we read that as a numpy array? That requires serio

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Andreas Mueller
Why should it warn? It works 100% in fetching matlab formatted data on mldata. On 12/09/2015 01:17 PM, Luca Puggini wrote: Yes openml seems a better choice. I would really like to have an easy way to import public datasets. I think that fetch_mldata should throw a warning when it is imported

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Luca Puggini
Yes openml seems a better choice. I would really like to have an easy way to import public datasets. I think that fetch_mldata should throw a warning when it is imported if we think this is not working 100%. Best, Luca On Wed, Dec 9, 2015 at 5:35 PM Andreas Mueller wrote: > I guess we use the

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Andreas Mueller
I guess we use the matlab data with is not required by mldata. We could add code that tries to fetch the matlab, and if that doesn't work uses the hdf5, with a soft dependency. Not sure we want that as mldata seems somewhat defunc. Maybe openml would be a better source (maybe once they finish thei

Re: [Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Sebastian Raschka
Hm, I have problems with that, too. Iris seems to work though. Just checked out the default link where scikit tries to fetch from, it’s http://mldata.org/repository/data/download/matlab/ So, for iris it would be http://mldata.org/repository/data/download/matlab/iris/ but http://mldata.org/r

[Scikit-learn-general] how to fetch data from mldata

2015-12-09 Thread Luca Puggini
Hi, I am trying to fetch this dataset from mldata. http://mldata.org/repository/data/viewslug/mhc-nips11/ I have tried: data = fetch_mldata('mhc-nips11', data_home=DG.load_path) but I obtain an error : HTTP Error 404: Dataset 'mhc-nips11' not found on mldata.org. I do not understand how to iden