Hello, I'm a student and would like to work on scikit-learn, as part of the GSoC program (assuming it gets accepted, but I see no reason to believe otherwise). I'm in general interested in the scientific Python community and in machine learning, though I've unfortunately not had any classes on it (yet!). I am however very willing to learn, and already have a working knowledge of Python. I've looked at the ideas page for this year[1], but it seems to be very sparse at the moment. A more complete list would be immensely helpful in choosing a good project - I am confident that I would be able to implement some algorithm even if I haven't encountered it before.
I would also be interested in adding Python 3 support, where progress is ongoing (as I understand). I brought this up on IRC, where NelleV expressed concern that it might not be enough for a GSoC project (but suggested I write to the list about it regardless). I see the current progress is a bit of a mess - the readme suggests only some code is "py3k-ready" and should just be translated directly; setup.py uses 2to3 on all code but I haven't actually succeeded in installing it because it doesn't play nice with Tox (which is a really nice tool for automated creation of sandboxed environments). In my humble opinion, the whole setup.py file should probably be reworked and maybe ported to distribute -- on the other hand, I've only worked with pure Python projects so perhaps mixing in Cython brings new challenges. As far as my experience goes, I have participated last year as a GSoC student for SymPy, which I ported to Python 3 (though a compatible version isn't released yet; you can read my report at [2] or checkout my blog[3]). Other than the porting itself, I was also concerned with general improvements to our testing infrastructure and process, and I think I've been successful enough in my efforts. I have remained a sporadic contributor after the summer, though school has unfortunately limited that. In any case, I feel I have good knowledge on the intricacies of supporting multiple Pythons with a single code-base. I've downloaded scikits-learn and started playing around with it, running tests and such, but I haven't had the time to really dive into it (again, school), so this is mostly an early post to gauge if python 3 porting can be considered a full project or if I should seek something else. I intend to contribute at least some code in any case (plus, to satisfy the patch requirement). Thank you for your time, [1] https://github.com/scikit-learn/scikit-learn/wiki/A-list-of-topics-for-a-google-summer-of-code-%28gsoc%29-2012 [2] https://github.com/sympy/sympy/wiki/GSoC-2011-report:-Vladimir-Peri%C4%87:-Porting-to-Python-3 [3] http://vperic.blogspot.com/ -- Vladimir Perić ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
