Hi Andreas,

On Tuesday 23 January 2018 09:12 PM, Gaurav Dhingra wrote:




-------- Forwarded Message --------
Subject:        Re: [scikit-learn] Topic for thesis work on scikit learn
Date:   Tue, 23 Jan 2018 10:16:36 -0500
From:   Andreas Mueller <t3k...@gmail.com>
To:     Gaurav Dhingra <gauravdhingra.g...@gmail.com>



Hi Gaurav.

Is your mentor experienced in contributing to sklearn?


No, she isn't.

Will they be able to review your code to the scikit-learn standards?


No.

Have you worked on any other pull requests so far?


I've on a few. Please have a look at https://github.com/scikit-learn/scikit-learn/pulls/gxyd, infact I expect that 3 of the open PR's will be merged soon.

Getting anything into scikit-learn without close collaboration with the community is quite tricky.

Having a faster K-means implementation based on recent research in the area would be interesting, There's also interest in adding Robust PCA, probabilistic inference trees, and improving the latent dirichlet alloctation code.


I tried to look into what /scikit-learn community/////devs/ consider a priority to have in their code-base (instead of me looking explicitly for topics I like). When I looked, I thought of https://github.com/scikit-learn/scikit-learn/issues/8337, or https://github.com/scikit-learn/scikit-learn/issues/6557 as the possible topics. But since I'm aware that unavailability of yours (busy in teaching purpose can be an issue), so I simultaneously looked for other options. I'd a conversation with Joel (he was kind enough to PM me), this is what he said (only the important part of conversation):

| Tricky thinngs we’ve been trying to do for years:
|     * estimator tags
|     * sample props
| tools for optimising cluster parameters (e.g. #6948)
| sample props == #4497 and associated
| related to clusterer parameters, #6160
| estimator tags relates to #6715
| #6777 looks tricky from an ML perspective.

I'm thinking of choosing https://github.com/scikit-learn/scikit-learn/pull/6948 (ENH optimal n_clusters value),  i.e completing that PR. If you will be having availability to review my PR's (if I do open them), then I'd glad to work with you on either /Conditional inference trees /or /adding post-pruning for decision trees/.

I'm aware as Joel earlier put it /Andreas has escaped into the teaching world/. Anyways, I don't expect my guide to provide me feedback in regards to scikit-learn code, though she will have theoretical explanation to my questions definitely. Also, since we can also have a co-guide (apart from local guide), I would definitely consider that as an option for someone from scikit-learn, even if it be you or may be Joel. But even Joel is expected to get back to academic world as well.

If things don't go a little positive (neither you or Joel or may be someone else from scikit-learn community is available), I'm gonna be taking a little longer but I'll eventually get there probably.

You can find issues on any of these in the issue tracker, which also has many more feature requests.

Andy


On 12/31/2017 05:46 AM, Gaurav Dhingra wrote:

Hi Andreas,

I think I'll get access to a local mentor from my college, so I think I rule that issue out, though for technicalities still I would /like/ to be more dependent on feedback from the scikit-learn community, since my aim wouldn't be to make something for my own use but rather something that would be more useful for the scikit-learn community, so that it eventually gets merged into master.

I'm currently looking for topic that I can take up, I tried looking into scikit-learn wiki but it doesn't mention for what I'm looking for (no topic is mentioned). Do you have some topic in mind that could be useful for addition to scikit-learn? Even if you could direct me to appropriate links I would be happy to look into those.


On Wednesday 01 November 2017 01:43 AM, Andreas Mueller wrote:
Hi Gaurav.

Do you have a local mentor? I think having a mentor that can guide you during a thesis is very important. You could get some feedback from the community for a contribution, but that can be slow, and is entirely on volunteer basis, so there is no guarantee that you'll get the necessary feedback in time
to finish your thesis.

Mentoring a thesis - in particular without knowing you - is a serious commitment, so I'm not sure someone from inside the project will want to do this. I saw you already made a contribution in https://github.com/scikit-learn/scikit-learn/pull/10005 but that's a very different scope than doing what I expect would be several month of work.

Though in this regard I've made a few more contributions, here is the link https://github.com/scikit-learn/scikit-learn/pulls/gxyd, though I know none of them is a big contribution. If you think I should work on a big enough PR, can you please suggest me some issue in that regard?

Thanks.



Best,
Andy

On 10/31/2017 03:31 PM, Gaurav Dhingra wrote:
Hi everyone,

I am a final year (5th year) undergraduate Applied Mathematics student in India. I am thinking of doing my final year thesis by doing some work (coding part) on scikit learn, so I was thinking if anyone could tell me if there are available topics (not necessarily names of those topics) that I could work on being an undergraduate student? I would want to expand upon this in December when my exams will be over. But in the mean time would want to take a step in that direction by just knowing if there will be available topics that I could work on.

It could be the case that available topics are not so easy for an undergraduate, still in that case I would like to do some research on the topics first.


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

--
Gaurav Dhingra
(sent from Thunderbird email client)


--
Gaurav Dhingra
(sent from Thunderbird email client)

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to