Re: [Scikit-learn-general] (Deep learning) pre-proposal for (GSOC) 2013

Frédéric Bastien Thu, 02 May 2013 07:21:24 -0700

Hi,

I have no dough you are a great programmer, but even people in my lab that
is specialized in deep learning won't be able to do the full list while
respecting the scikit-learn code quality, documentation and
performance/efficiency level.


I think you should keep 1 or 2 deep model. Just look at the time it took
for the MLP and RBM PR to be done. I don't expect less time for yours.

Other people mentioned the problem of usability of deep learning
techniques. I think you should focus on that instead of doing many models.
That is the what will difference your implementation from
Theano/Pylearn2/DLT. For this, you could check James Bergstra email on this
list that talk about automatic hyper-parameter selection. I that could
solve a big part of the usability problem of deep learning. I suppose good
documentation could do the rest.

Also, shared variable is a Theano only thing. For GPU without Theano, you
can look at Numba Pro, PyCUDA or PyOpenCL. scikit-learn don't want Theano
as a dependency (and I understand that).

HTH

Frédéric Bastien

Disclaimer: I'm a core Theano developer. I never contributed to
scikit-learn, so take other people comment from this list more important
then mine.


On Thu, May 2, 2013 at 5:34 AM, Issam <issamo...@gmail.com> wrote:

> Hi Vladn,
>
> Here is the updated proposal, I have added the current challenges and
> proposed solutions on the abstract,
>
>
> https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2013/issamou/1#
>
> Thank you!
>
> On 5/2/2013 11:34 AM, Vlad Niculae wrote:
> > Sorry, I just saw that your submission is on Melange.
> >
> > I think the proposal could use some discussion on what issues might be
> > faced.  Many people here have expressed concerns about including "deep
> > stuff", the difficulty to have sensible defaults, the difficulty to
> > having a general-purpose efficient implementation that can be used on
> > different domains without hacking the code.  In the very simple RBM,
> > the example is still unsatisfactory because it is hard to show off the
> > algorithm on too small a dataset.  This might be even trickier with
> > deeper things.
> >
> > In tuning a good neural model some know-how and tricks are needed,
> > many times you need to look over the training process and measure
> > statistics.  It would be useful to describe this kind of difficulties
> > and how we might be able to avoid them, what kind of hyperparameter
> > heuristics / initialization should be used, etc.  It is early to go
> > into it too deeply (pun intended) but I think the proposal can benefit
> > by your embracing the skeptic side.
> >
> > Hope this helps,
> > Vlad
> >
> >
> > On Thu, May 2, 2013 at 5:20 PM, Vlad Niculae <zephy...@gmail.com> wrote:
> >> Hi Issam,
> >>
> >> The deadline is fast approaching.  How is your proposal going? Could
> >> you share a version so we can give some feedback?
> >>
> >> Yours,
> >> Vlad
> >>
> >> On Sat, Apr 20, 2013 at 3:57 AM, amir rahimi <noname01....@gmail.com>
> wrote:
> >>> Sorry, I didn't see Andy's note ;)
> >>>
> >>>
> >>> On Fri, Apr 19, 2013 at 11:23 PM, amir rahimi <noname01....@gmail.com>
> >>> wrote:
> >>>> Hi,
> >>>> I recommend Theano if you want to use python with GPU for deep
> learning.
> >>>> It is tightly integrated with numpy....
> >>>>
> >>>> Best,
> >>>> Amir
> >>>>
> >>>>
> >>>> On Thu, Apr 18, 2013 at 9:21 PM, Wei LI <kuant...@gmail.com> wrote:
> >>>>> @Andy What do you mean by "blackbox" algorithm? Does that mean
> something
> >>>>> similar to pylearn2?
> >>>>>
> >>>>> @Issam, It seems to me that scalablity is a key factor to train deep
> >>>>> models and make them work. Do you have any suggestion how to make it
> >>>>> scalable while still fits in sklearn framework? I think sklearn
> cannot
> >>>>> supports GPU easily. I wanna know is training a deep model for a
> mid-level
> >>>>> scale(maybe like cifar?) painful on CPU only with numpy?
> >>>>>
> >>>>> Best,
> >>>>> Wei
> >>>>>
> >>>>> On Fri, Apr 19, 2013 at 12:27 AM, Andreas Mueller
> >>>>> <amuel...@ais.uni-bonn.de> wrote:
> >>>>>> Hi Issam.
> >>>>>> Thank you for your interest. Have you looked at the
> >>>>>> MLP and RBM pull requests that are currently open?
> >>>>>> How would your project relate to those?
> >>>>>>
> >>>>>> A real problem is that we don't want to replicate theano
> >>>>>> and rather have a somewhat "black box" algorithm that people can
> >>>>>> apply....
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Andy
> >>>>>>
> >>>>>>
> >>>>>> On 04/18/2013 06:07 PM, Issam wrote:
> >>>>>>> Hi scikit,
> >>>>>>>
> >>>>>>> Here I am proposing to work on deep learning topic for GSOC 2013.
> Deep
> >>>>>>> learning is a relatively new research area that  is progressing
> fast
> >>>>>>> with a lot of potential for contributions. It involves an
> intersting
> >>>>>>> idea by trying to imitate the brain, as it uses many levels (hidden
> >>>>>>> layers) of processing. Where the levels are at decreasing order of
> >>>>>>> abstractions!
> >>>>>>>
> >>>>>>> In this project, I'm planning to work on each step carefully,
> first I
> >>>>>>> look into "Deep Boltzmann machines",  then "Deep belief
> >>>>>>> networks","Deep
> >>>>>>> auto-encoders", "Stacked denoising auto-encoders", and more. I
> could
> >>>>>>> create a complete plan for this, once I get your feedback :)
> >>>>>>>
> >>>>>>> I have been involved in quite a number of machine learning
> projects,
> >>>>>>> from dealing with imbalanced datasets (software quality
> prediction),
> >>>>>>> to
> >>>>>>> XML classification, from recognizing gender out of handwriting, to
> >>>>>>> breast cancer prediction using mammograms. I'm in my second
> semester
> >>>>>>> as
> >>>>>>> a graduate student (MSc), and machine learning is my research
> area. My
> >>>>>>> thesis would involve deep learning, which i will apply on
> >>>>>>> bioinformatics
> >>>>>>> and face recognition.
> >>>>>>>
> >>>>>>> I would be more than happy to work with a mentor on this!
> >>>>>>>
> >>>>>>> Thank you!
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> --Issam Laradji
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ------------------------------------------------------------------------------
> >>>>>>> Precog is a next-generation analytics platform capable of advanced
> >>>>>>> analytics on semi-structured data. The platform includes APIs for
> >>>>>>> building
> >>>>>>> apps and a phenomenal toolset for data science. Developers can use
> >>>>>>> our toolset for easy data analysis & visualization. Get a free
> >>>>>>> account!
> >>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
> >>>>>>> _______________________________________________
> >>>>>>> Scikit-learn-general mailing list
> >>>>>>> Scikit-learn-general@lists.sourceforge.net
> >>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>>>
> >>>>>>
> >>>>>>
> ------------------------------------------------------------------------------
> >>>>>> Precog is a next-generation analytics platform capable of advanced
> >>>>>> analytics on semi-structured data. The platform includes APIs for
> >>>>>> building
> >>>>>> apps and a phenomenal toolset for data science. Developers can use
> >>>>>> our toolset for easy data analysis & visualization. Get a free
> account!
> >>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
> >>>>>> _______________________________________________
> >>>>>> Scikit-learn-general mailing list
> >>>>>> Scikit-learn-general@lists.sourceforge.net
> >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> LI, Wei
> >>>>> Tsinghua/CUHK
> >>>>> http://kuantkid.github.com/
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> ------------------------------------------------------------------------------
> >>>>> Precog is a next-generation analytics platform capable of advanced
> >>>>> analytics on semi-structured data. The platform includes APIs for
> >>>>> building
> >>>>> apps and a phenomenal toolset for data science. Developers can use
> >>>>> our toolset for easy data analysis & visualization. Get a free
> account!
> >>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
> >>>>> _______________________________________________
> >>>>> Scikit-learn-general mailing list
> >>>>> Scikit-learn-general@lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> ----------------------------------------------------------------------
> >>>> #include <stdio.h>
> >>>> double d[]={9299037773.178347,2226415.983937417,307.0};
> >>>> main(){d[2]--?d[0]*=4,d[1]*=5,main():printf((char*)d);}
> >>>> ----------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> --
> >>> ----------------------------------------------------------------------
> >>> #include <stdio.h>
> >>> double d[]={9299037773.178347,2226415.983937417,307.0};
> >>> main(){d[2]--?d[0]*=4,d[1]*=5,main():printf((char*)d);}
> >>> ----------------------------------------------------------------------
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Precog is a next-generation analytics platform capable of advanced
> >>> analytics on semi-structured data. The platform includes APIs for
> building
> >>> apps and a phenomenal toolset for data science. Developers can use
> >>> our toolset for easy data analysis & visualization. Get a free account!
> >>> http://www2.precog.com/precogplatform/slashdotnewsletter
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >
>
>
>
> ------------------------------------------------------------------------------
> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
> Get 100% visibility into your production application - at no cost.
> Code-level diagnostics for performance bottlenecks with <2% overhead
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap1
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] (Deep learning) pre-proposal for (GSOC) 2013

Reply via email to