Re: [Scikit-learn-general] (Deep learning) pre-proposal for (GSOC) 2013

Frédéric Bastien Thu, 02 May 2013 10:36:34 -0700

I just taught that you could consentrate on the DL algo that stack MLP and
RBM as there is a PR for each of them.


So this would mean helping to finish them. I think it is better to have
fewer but finished model then more model.

Don't forget my previous disclaimer.

Fred


On Thu, May 2, 2013 at 12:34 PM, Frédéric Bastien <no...@nouiz.org> wrote:

> Sorry, but I'm not an DL expert and can't do such recommandation.
>
> Maybe someone else here can, but you can also ask on the pylearn2 mailing
> list.
>
> Fred
>
>
> On Thu, May 2, 2013 at 11:12 AM, Issam <issamo...@gmail.com> wrote:
>
>>  Hi,
>>
>> Thanks a lot for the comment, I hope this doesn't open a new thread :),
>> I'm pretty new to using mailing list.
>>
>> You are right that I'm underestimating the development time to craft
>> efficient,usable DL algorithms.
>>
>> For this I would like to ask your opinion on which deep models do you
>> recommend I should focus on within the given time frame?
>>
>> Thanks a lot,
>> yours truly
>> --Issam
>>
>> On 5/2/2013 5:20 PM, Frédéric Bastien wrote:
>>
>>    Hi,
>>
>>  I have no dough you are a great programmer, but even people in my lab
>> that is specialized in deep learning won't be able to do the full list
>> while respecting the scikit-learn code quality, documentation and
>> performance/efficiency level.
>>
>>  I think you should keep 1 or 2 deep model. Just look at the time it took
>> for the MLP and RBM PR to be done. I don't expect less time for yours.
>>
>>  Other people mentioned the problem of usability of deep learning
>> techniques. I think you should focus on that instead of doing many models.
>> That is the what will difference your implementation from
>> Theano/Pylearn2/DLT. For this, you could check James Bergstra email on this
>> list that talk about automatic hyper-parameter selection. I that could
>> solve a big part of the usability problem of deep learning. I suppose good
>> documentation could do the rest.
>>
>>  Also, shared variable is a Theano only thing. For GPU without Theano,
>> you can look at Numba Pro, PyCUDA or PyOpenCL. scikit-learn don't want
>> Theano as a dependency (and I understand that).
>>
>>  HTH
>>
>> Frédéric Bastien
>>
>> Disclaimer: I'm a core Theano developer. I never contributed to
>> scikit-learn, so take other people comment from this list more important
>> then mine.
>>
>>
>> On Thu, May 2, 2013 at 5:34 AM, Issam <issamo...@gmail.com> wrote:
>>
>>> Hi Vladn,
>>>
>>> Here is the updated proposal, I have added the current challenges and
>>> proposed solutions on the abstract,
>>>
>>>
>>> https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2013/issamou/1#
>>>
>>> Thank you!
>>>
>>> On 5/2/2013 11:34 AM, Vlad Niculae wrote:
>>> > Sorry, I just saw that your submission is on Melange.
>>> >
>>> > I think the proposal could use some discussion on what issues might be
>>> > faced.  Many people here have expressed concerns about including "deep
>>> > stuff", the difficulty to have sensible defaults, the difficulty to
>>> > having a general-purpose efficient implementation that can be used on
>>> > different domains without hacking the code.  In the very simple RBM,
>>> > the example is still unsatisfactory because it is hard to show off the
>>> > algorithm on too small a dataset.  This might be even trickier with
>>> > deeper things.
>>> >
>>> > In tuning a good neural model some know-how and tricks are needed,
>>> > many times you need to look over the training process and measure
>>> > statistics.  It would be useful to describe this kind of difficulties
>>> > and how we might be able to avoid them, what kind of hyperparameter
>>> > heuristics / initialization should be used, etc.  It is early to go
>>> > into it too deeply (pun intended) but I think the proposal can benefit
>>> > by your embracing the skeptic side.
>>> >
>>> > Hope this helps,
>>> > Vlad
>>> >
>>> >
>>> > On Thu, May 2, 2013 at 5:20 PM, Vlad Niculae <zephy...@gmail.com>
>>> wrote:
>>> >> Hi Issam,
>>> >>
>>> >> The deadline is fast approaching.  How is your proposal going? Could
>>> >> you share a version so we can give some feedback?
>>> >>
>>> >> Yours,
>>> >> Vlad
>>> >>
>>> >> On Sat, Apr 20, 2013 at 3:57 AM, amir rahimi <noname01....@gmail.com>
>>> wrote:
>>> >>> Sorry, I didn't see Andy's note ;)
>>> >>>
>>> >>>
>>> >>> On Fri, Apr 19, 2013 at 11:23 PM, amir rahimi <
>>> noname01....@gmail.com>
>>> >>> wrote:
>>> >>>> Hi,
>>> >>>> I recommend Theano if you want to use python with GPU for deep
>>> learning.
>>> >>>> It is tightly integrated with numpy....
>>> >>>>
>>> >>>> Best,
>>> >>>> Amir
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Apr 18, 2013 at 9:21 PM, Wei LI <kuant...@gmail.com> wrote:
>>> >>>>> @Andy What do you mean by "blackbox" algorithm? Does that mean
>>> something
>>> >>>>> similar to pylearn2?
>>> >>>>>
>>> >>>>> @Issam, It seems to me that scalablity is a key factor to train
>>> deep
>>> >>>>> models and make them work. Do you have any suggestion how to make
>>> it
>>> >>>>> scalable while still fits in sklearn framework? I think sklearn
>>> cannot
>>> >>>>> supports GPU easily. I wanna know is training a deep model for a
>>> mid-level
>>> >>>>> scale(maybe like cifar?) painful on CPU only with numpy?
>>> >>>>>
>>> >>>>> Best,
>>> >>>>> Wei
>>> >>>>>
>>> >>>>> On Fri, Apr 19, 2013 at 12:27 AM, Andreas Mueller
>>> >>>>> <amuel...@ais.uni-bonn.de> wrote:
>>> >>>>>> Hi Issam.
>>> >>>>>> Thank you for your interest. Have you looked at the
>>> >>>>>> MLP and RBM pull requests that are currently open?
>>> >>>>>> How would your project relate to those?
>>> >>>>>>
>>> >>>>>> A real problem is that we don't want to replicate theano
>>> >>>>>> and rather have a somewhat "black box" algorithm that people can
>>> >>>>>> apply....
>>> >>>>>>
>>> >>>>>> Cheers,
>>> >>>>>> Andy
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On 04/18/2013 06:07 PM, Issam wrote:
>>> >>>>>>> Hi scikit,
>>> >>>>>>>
>>> >>>>>>> Here I am proposing to work on deep learning topic for GSOC
>>> 2013. Deep
>>> >>>>>>> learning is a relatively new research area that  is progressing
>>> fast
>>> >>>>>>> with a lot of potential for contributions. It involves an
>>> intersting
>>> >>>>>>> idea by trying to imitate the brain, as it uses many levels
>>> (hidden
>>> >>>>>>> layers) of processing. Where the levels are at decreasing order
>>> of
>>> >>>>>>> abstractions!
>>> >>>>>>>
>>> >>>>>>> In this project, I'm planning to work on each step carefully,
>>> first I
>>> >>>>>>> look into "Deep Boltzmann machines",  then "Deep belief
>>> >>>>>>> networks","Deep
>>> >>>>>>> auto-encoders", "Stacked denoising auto-encoders", and more. I
>>> could
>>> >>>>>>> create a complete plan for this, once I get your feedback :)
>>> >>>>>>>
>>> >>>>>>> I have been involved in quite a number of machine learning
>>> projects,
>>> >>>>>>> from dealing with imbalanced datasets (software quality
>>> prediction),
>>> >>>>>>> to
>>> >>>>>>> XML classification, from recognizing gender out of handwriting,
>>> to
>>> >>>>>>> breast cancer prediction using mammograms. I'm in my second
>>> semester
>>> >>>>>>> as
>>> >>>>>>> a graduate student (MSc), and machine learning is my research
>>> area. My
>>> >>>>>>> thesis would involve deep learning, which i will apply on
>>> >>>>>>> bioinformatics
>>> >>>>>>> and face recognition.
>>> >>>>>>>
>>> >>>>>>> I would be more than happy to work with a mentor on this!
>>> >>>>>>>
>>> >>>>>>> Thank you!
>>> >>>>>>>
>>> >>>>>>> Best regards,
>>> >>>>>>> --Issam Laradji
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> ------------------------------------------------------------------------------
>>> >>>>>>> Precog is a next-generation analytics platform capable of
>>> advanced
>>> >>>>>>> analytics on semi-structured data. The platform includes APIs for
>>> >>>>>>> building
>>> >>>>>>> apps and a phenomenal toolset for data science. Developers can
>>> use
>>> >>>>>>> our toolset for easy data analysis & visualization. Get a free
>>> >>>>>>> account!
>>> >>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> >>>>>>> _______________________________________________
>>> >>>>>>> Scikit-learn-general mailing list
>>> >>>>>>> Scikit-learn-general@lists.sourceforge.net
>>> >>>>>>>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> ------------------------------------------------------------------------------
>>> >>>>>> Precog is a next-generation analytics platform capable of advanced
>>> >>>>>> analytics on semi-structured data. The platform includes APIs for
>>> >>>>>> building
>>> >>>>>> apps and a phenomenal toolset for data science. Developers can use
>>> >>>>>> our toolset for easy data analysis & visualization. Get a free
>>> account!
>>> >>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> >>>>>> _______________________________________________
>>> >>>>>> Scikit-learn-general mailing list
>>> >>>>>> Scikit-learn-general@lists.sourceforge.net
>>> >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> LI, Wei
>>> >>>>> Tsinghua/CUHK
>>> >>>>> http://kuantkid.github.com/
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> ------------------------------------------------------------------------------
>>> >>>>> Precog is a next-generation analytics platform capable of advanced
>>> >>>>> analytics on semi-structured data. The platform includes APIs for
>>> >>>>> building
>>> >>>>> apps and a phenomenal toolset for data science. Developers can use
>>> >>>>> our toolset for easy data analysis & visualization. Get a free
>>> account!
>>> >>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> >>>>> _______________________________________________
>>> >>>>> Scikit-learn-general mailing list
>>> >>>>> Scikit-learn-general@lists.sourceforge.net
>>> >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>>
>>> ----------------------------------------------------------------------
>>> >>>> #include <stdio.h>
>>> >>>> double d[]={9299037773.178347,2226415.983937417,307.0};
>>> >>>> main(){d[2]--?d[0]*=4,d[1]*=5,main():printf((char*)d);}
>>> >>>>
>>> ----------------------------------------------------------------------
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>>
>>> ----------------------------------------------------------------------
>>> >>> #include <stdio.h>
>>> >>> double d[]={9299037773.178347,2226415.983937417,307.0};
>>> >>> main(){d[2]--?d[0]*=4,d[1]*=5,main():printf((char*)d);}
>>> >>>
>>> ----------------------------------------------------------------------
>>> >>>
>>> >>>
>>> ------------------------------------------------------------------------------
>>> >>> Precog is a next-generation analytics platform capable of advanced
>>> >>> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> >>> apps and a phenomenal toolset for data science. Developers can use
>>> >>> our toolset for easy data analysis & visualization. Get a free
>>> account!
>>> >>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> >>> _______________________________________________
>>> >>> Scikit-learn-general mailing list
>>> >>> Scikit-learn-general@lists.sourceforge.net
>>> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> >>>
>>> >
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
>>> Get 100% visibility into your production application - at no cost.
>>> Code-level diagnostics for performance bottlenecks with <2% overhead
>>> Download for free and get started troubleshooting in minutes.
>>> http://p.sf.net/sfu/appdyn_d2d_ap1
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] (Deep learning) pre-proposal for (GSOC) 2013

Reply via email to