https://jaberg.github.io/skdata/

On Thu, Dec 10, 2015 at 02:12:19PM -0500, Sebastian Raschka wrote:
> In my opinion, I think we shouldn’t strive for a “general purpose” parser. 
> The problem is that websites / data repositories are simply not consistent 
> enough. Also, it’s not solely the maintainers fault that things are not 
> consistent: Different data formats (csv, sql, hdf5, json, …) have been 
> invented for a reason; Sometimes just makes more sense to store a particular 
> dataset in one or the other format. 

> So, instead of aiming for a general purpose parser, I’d be a better idea to 
> write a specific function or method for converting a particular dataset from 
> a particular source into a sckit-learn compatible format and accounting for 
> potential glitches. However, I also think that this is way beyond the scope 
> of scikit-learn, but it sounds like an interesting idea for a side-project 
> like "scikit-datasets" or so.

> Best,
> Sebastian

> > On Dec 10, 2015, at 4:06 AM, federico vaggi <vaggi.feder...@gmail.com> 
> > wrote:

> > There was a similar effort here: 
> > https://groups.google.com/forum/#!searchin/keras-users/datasets/keras-users/n6jE9eFcaYI/Roo-rWK6CQAJ
> >  - where someone wrote a small library to abstract the loading of open 
> > source datasets.  While having extra dependencies is something that should 
> > probably be avoided, I don't think it makes a lot of sense to build a lot 
> > of code into scikit-learn to load and fetch datasets (except the truly 
> > common ones like newsgroups/iris, etc).

> > I think with the proliferation of new deep learning libraries that are 
> > cropping up, it would be good to agree to a common format to load/store 
> > different datasets without duplicating the effort in many places.

> > On Wed, 9 Dec 2015 at 19:58 Andreas Mueller <t3k...@gmail.com> wrote:


> > On 12/09/2015 01:48 PM, Gael Varoquaux wrote:
> > > On Wed, Dec 09, 2015 at 12:33:55PM -0500, Andreas Mueller wrote:
> > >> I guess we use the matlab data with is not required by mldata.
> > >> We could add code that tries to fetch the matlab, and if that doesn't
> > >> work uses the hdf5,
> > > I'd rather not. I'd rather we just have a good error message.

> > Never mind, I don't think this was the problem here anyhow.

> > ------------------------------------------------------------------------------
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> > ------------------------------------------------------------------------------
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to