Hi Andreas,
On Tuesday 23 January 2018 09:12 PM, Gaurav Dhingra wrote:
-------- Forwarded Message --------
Subject: Re: [scikit-learn] Topic for thesis work on scikit learn
Date: Tue, 23 Jan 2018 10:16:36 -0500
From: Andreas Mueller <t3k...@gmail.com>
To: Gaurav Dhingra <gauravdhingra.g...@gmail.com>
Hi Gaurav.
Is your mentor experienced in contributing to sklearn?
No, she isn't.
Will they be able to review your code to the scikit-learn standards?
No.
Have you worked on any other pull requests so far?
I've on a few. Please have a look at
https://github.com/scikit-learn/scikit-learn/pulls/gxyd, infact I expect
that 3 of the open PR's will be merged soon.
Getting anything into scikit-learn without close collaboration with
the community is quite tricky.
Having a faster K-means implementation based on recent research in the
area would be interesting,
There's also interest in adding Robust PCA, probabilistic inference
trees, and improving the latent dirichlet alloctation code.
I tried to look into what /scikit-learn community/////devs/ consider a
priority to have in their code-base (instead of me looking explicitly
for topics I like). When I looked, I thought of
https://github.com/scikit-learn/scikit-learn/issues/8337, or
https://github.com/scikit-learn/scikit-learn/issues/6557 as the possible
topics. But since I'm aware that unavailability of yours (busy in
teaching purpose can be an issue), so I simultaneously looked for other
options. I'd a conversation with Joel (he was kind enough to PM me),
this is what he said (only the important part of conversation):
| Tricky thinngs we’ve been trying to do for years:
| * estimator tags
| * sample props
| tools for optimising cluster parameters (e.g. #6948)
| sample props == #4497 and associated
| related to clusterer parameters, #6160
| estimator tags relates to #6715
| #6777 looks tricky from an ML perspective.
I'm thinking of choosing
https://github.com/scikit-learn/scikit-learn/pull/6948 (ENH optimal
n_clusters value), i.e completing that PR. If you will be having
availability to review my PR's (if I do open them), then I'd glad to
work with you on either /Conditional inference trees /or /adding
post-pruning for decision trees/.
I'm aware as Joel earlier put it /Andreas has escaped into the teaching
world/. Anyways, I don't expect my guide to provide me feedback in
regards to scikit-learn code, though she will have theoretical
explanation to my questions definitely. Also, since we can also have a
co-guide (apart from local guide), I would definitely consider that as
an option for someone from scikit-learn, even if it be you or may be
Joel. But even Joel is expected to get back to academic world as well.
If things don't go a little positive (neither you or Joel or may be
someone else from scikit-learn community is available), I'm gonna be
taking a little longer but I'll eventually get there probably.
You can find issues on any of these in the issue tracker, which also
has many more feature requests.
Andy
On 12/31/2017 05:46 AM, Gaurav Dhingra wrote:
Hi Andreas,
I think I'll get access to a local mentor from my college, so I think
I rule that issue out, though for technicalities still I would /like/
to be more dependent on feedback from the scikit-learn community,
since my aim wouldn't be to make something for my own use but rather
something that would be more useful for the scikit-learn community,
so that it eventually gets merged into master.
I'm currently looking for topic that I can take up, I tried looking
into scikit-learn wiki but it doesn't mention for what I'm looking
for (no topic is mentioned). Do you have some topic in mind that
could be useful for addition to scikit-learn? Even if you could
direct me to appropriate links I would be happy to look into those.
On Wednesday 01 November 2017 01:43 AM, Andreas Mueller wrote:
Hi Gaurav.
Do you have a local mentor? I think having a mentor that can guide
you during a thesis is very important.
You could get some feedback from the community for a contribution,
but that can be slow,
and is entirely on volunteer basis, so there is no guarantee that
you'll get the necessary feedback in time
to finish your thesis.
Mentoring a thesis - in particular without knowing you - is a
serious commitment, so I'm not sure someone
from inside the project will want to do this. I saw you already made
a contribution in
https://github.com/scikit-learn/scikit-learn/pull/10005
but that's a very different scope than doing what I expect would be
several month of work.
Though in this regard I've made a few more contributions, here is the
link https://github.com/scikit-learn/scikit-learn/pulls/gxyd, though
I know none of them is a big contribution. If you think I should work
on a big enough PR, can you please suggest me some issue in that regard?
Thanks.
Best,
Andy
On 10/31/2017 03:31 PM, Gaurav Dhingra wrote:
Hi everyone,
I am a final year (5th year) undergraduate Applied Mathematics
student in India. I am thinking of doing my final year thesis by
doing some work (coding part) on scikit learn, so I was thinking if
anyone could tell me if there are available topics (not necessarily
names of those topics) that I could work on being an undergraduate
student? I would want to expand upon this in December when my exams
will be over. But in the mean time would want to take a step in
that direction by just knowing if there will be available topics
that I could work on.
It could be the case that available topics are not so easy for an
undergraduate, still in that case I would like to do some research
on the topics first.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
--
Gaurav Dhingra
(sent from Thunderbird email client)
--
Gaurav Dhingra
(sent from Thunderbird email client)
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn