Hello,

I'm a student and would like to work on scikit-learn, as part of the
GSoC program (assuming it gets accepted, but I see no reason to
believe otherwise). I'm in general interested in the scientific Python
community and in machine learning, though I've unfortunately not had
any classes on it (yet!). I am however very willing to learn, and
already have a working knowledge of Python. I've looked at the ideas
page for this year[1], but it seems to be very sparse at the moment. A
more complete list would be immensely helpful in choosing a good
project - I am confident that I would be able to implement some
algorithm even if I haven't encountered it before.

I would also be interested in adding Python 3 support, where progress
is ongoing (as I understand). I brought this up on IRC, where NelleV
expressed concern that it might not be enough for a GSoC project (but
suggested I write to the list about it regardless). I see the current
progress is a bit of a mess - the readme suggests only some code is
"py3k-ready" and should just be translated directly; setup.py uses
2to3 on all code but I haven't actually succeeded in installing it
because it doesn't play nice with Tox (which is a really nice tool for
automated creation of sandboxed environments). In my humble opinion,
the whole setup.py file should probably be reworked and maybe ported
to distribute -- on the other hand, I've only worked with pure Python
projects so perhaps mixing in Cython brings new challenges.

As far as my experience goes, I have participated last year as a GSoC
student for SymPy, which I ported to Python 3 (though a compatible
version isn't released yet; you can read my report at [2] or checkout
my blog[3]). Other than the porting itself, I was also concerned with
general improvements to our testing infrastructure and process, and I
think I've been successful enough in my efforts. I have remained a
sporadic contributor after the summer, though school has unfortunately
limited that. In any case, I feel I have good knowledge on the
intricacies of supporting multiple Pythons with a single code-base.

I've downloaded scikits-learn and started playing around with it,
running tests and such, but I haven't had the time to really dive into
it (again, school), so this is mostly an early post to gauge if python
3 porting can be considered a full project or if I should seek
something else. I intend to contribute at least some code in any case
(plus, to satisfy the patch requirement).

Thank you for your time,


[1] 
https://github.com/scikit-learn/scikit-learn/wiki/A-list-of-topics-for-a-google-summer-of-code-%28gsoc%29-2012
[2] 
https://github.com/sympy/sympy/wiki/GSoC-2011-report:-Vladimir-Peri%C4%87:-Porting-to-Python-3
[3] http://vperic.blogspot.com/

-- 
Vladimir Perić

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to