Re: [Scikit-learn-general] (Deep learning) pre-proposal for (GSOC) 2013

Andy Mueller Sun, 05 May 2013 15:02:51 -0700

Hi Issam.
Please stay on the list ;)

Sorry I have been so critical of your proposal.
Merging the two PRs is a good proposal.


I just don't think the "speeding up" is realistic.
As I said, we will not include Numba into scikit-learn very soon.
And we will not include GPU implementations into scikit-learn any time 
soon. (I think there is a consensus on this for the time being.)

For the algorithms: deep believe nets are not really much to program. 
They state "train a couple of RBMs, then throw the weights in a neural 
network". So implementing a DBN is more about getting the API of RBMs 
and MLPs right.

Sorry, I am a bit behind my mails on scikit-learn and haven't followed 
the GSOC process much.
Maybe some of the mentors can say more about the current state?

Cheers,
Andy

On 05/04/2013 09:17 AM, Issam wrote:
> Hi Andy,
>
> I agree, Numba Pro is commercial, I put it because someone suggested 
> using it in the comments :). But the idea is to provide a speed boost, 
> which a basic numba might suffice.
>
> As per the schedule I devised, the first few weeks are dedicated to 
> pulling MLP and RBM, documenting  them, testing them on public 
> datasets, fixing any serious defects and publishing them  for scikit 
> users to use. So taking those two pull requests will be the starting 
> points.
>
> Even though, I am not very deep in "Deep Learning" yet, I see myself 
> quite an expert in Neural Networks (which deep learning is about). 
> Also I have a professor who is specialized in DL who I'm consulting 
> regularly.
> The problem with not giving such a detailed proposal is that I'm 
> having a busy semester which will end in May 26 where I will be 
> submitting 3 papers on Machine Learning. From May 26 onwards, I would 
> be devoting myself full time on deep learning.
>
> You say theano developers took several years  establishing theano, but 
> deep learning is a relatively new area, so they had to write the 
> algorithms from scratch, and probably took a long time developing the 
> API, website, servers and the infrastructure of Theano, which scikit 
> already has.
> In addition, I will be only be developing the main Deep learning 
> algorithm, for instance "Deep Belief Nets", since its full 
> description/pseudo code is already out there, the main difficulty will 
> be integrating it into scikit rather than the actual implementation.
>
> Anyhow, if this doesn't workout, I would be implementing Deep Learning 
> algorithms anyway throughout the summer :), since its my future area 
> of research.
>
> I only request a mentor to have him kindly familiarize me with the 
> convention that scikit expects for pushing the code into the 
> repository, everything else I can do alone or with the my supervisor 
> who is an expert in the area :).
>
> Thanks,
> Yours truly,
>
> --Issam
>
>
> On 5/4/2013 9:46 AM, Andy wrote:
>> On 05/03/2013 10:30 PM, Issam wrote:
>>> Hi Andy,
>>>
>>> The main idea behind proposing to use GPU  techniques is to have an 
>>> efficient implementation, it doesnt need to be GPU techniques, 
>>> rather any technique such as Numba Pro that speeds up deep learning 
>>> algorithms which demand a lot of computation.
>>>
>> Sorry, I think you misunderstood my remark.
>> I was not suggesting to use Numba Pro. This is a commercial product 
>> and definitely can not be used in scikit-learn.
>> I was rather asking how you would go about doing an efficient 
>> implementation.
>> This is something that needs to be well thought-through, raster than 
>> an afterthought.
>> Afaik several labs (!) have been working on theano for several years 
>> (!).
>>
>>> So are you suggesting that there would be no mentor for this 
>>> project? The objective is mainly to get scikit-learn started with 
>>> general-purpose deep learning algorithms...
>> I don't know. Is there a mentor? Who?
>>
>> I know a lot of the core people (Lars, Gael, Olivier, me) have been 
>> very busy and the GSOC is not as organized as it should be.
>> But mentors should be assigned previous to application.
>>
>> For getting sklearn started with deep learning:
>> Let me reiterate: there are MLP and RBM implementations. The RBM is 
>> done. It only needs finishing touches.
>> It was done by a deep learning expert, with reviews of people active 
>> in the field.
>> We really focused on having a working, well-documented, well 
>> illustrated numpy implementation at first.
>> I am not entirely certain about the current state of the MLP pull 
>> request, but I think there are now several working versions of it.
>>
>> Any deep learning contributions to scikit-learn should take these two 
>> pull requests as starting points.
>>
>> Cheers,
>> Andy
>


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] (Deep learning) pre-proposal for (GSOC) 2013

Reply via email to