Hello Jacob, This is my second draft for the proposal, Proposal : Second Draft <https://github.com/amanp10/scikit-learn/wiki/GSoC-2017-:-Parallel-Decision-Tree-Building>
It is incomplete in some places, related to detailing etc. I will need little more time for that. Meanwhile, I await your feedback and guidance. Thank You On 23 March 2017 at 02:38, Jacob Schreiber <[email protected]> wrote: > Hi Aman > > Likely the easiest way to parallelize decision tree building is to > parallelize the finding of the best split at each node, as it checks every > non-constant feature for the best split. Several other approaches focus on > how to parallelize tree building in the streaming or distributed cases, > which we are not interested in at the moment (though partially fitting > decision trees is a good separate project). > > As I mentioned in the github issue, it is likely easier to focus on this > single issue for GSoC as opposed to making it distinct from the multiclass > prediction, as this will provide similar speedups either way but be more > general. > > It'd be great if you could add your experience directly to the gist and > perhaps links to prior work if you have any of those. > > Something major missing from this is a proposed timeline. Several projects > fail because they are overly ambitious or not well managed time-wise. > Showing a timeline will help us manage the project later on, and ensure > that you're aware of what the steps of the project will be. > > Thanks for the effort so far! Let me know when you've made updates. > > Jacob > > On Wed, Mar 22, 2017 at 12:55 AM, Aman Pratik <[email protected]> > wrote: > >> Hello Developers, >> >> This is Aman Pratik. I am currently pursuing my B.Tech from Indian >> Institute of Technology, Varanasi. After doing some research I have found >> some material on Decision Trees and Parallelization. Hence, I propose my >> first draft for the project "Parallel Decision Tree Building" for GSoC 2017. >> >> Proposal : First Draft >> <https://github.com/amanp10/scikit-learn/wiki/GSoC-2017-:-Parallel-Decision-Tree-Building> >> >> Why me? >> >> I have been working in Python for the past 2 years and have good idea >> about Machine Learning algorithms. I am quite familiar with scikit-learn >> both as a user and a developer. >> >> These are the issues/PRs I have worked/working on for the past few months. >> >> [MRG+1] Issue#5803 : Regression Test added #8112 >> <https://github.com/scikit-learn/scikit-learn/pull/8112> >> >> [MRG] Issue#6673:Make a wrapper around functions that score an individual >> feature #8038 <https://github.com/scikit-learn/scikit-learn/pull/8038> >> >> [MRG] Issue #7987: Embarrassingly parallel "n_restarts_optimizer" in >> GaussianProcessRegressor #7997 >> <https://github.com/scikit-learn/scikit-learn/pull/7997> >> >> My GitHub Profile: amanp10 <https://www.github.com/amanp10> >> >> I have worked with parallelization in one of my PR, so I am not new to >> it. I have used cython a couple of times, though as a beginner. I have not >> used Decision Tree much but I am familiar with the theory and algorithm. >> Also, I am familiar with Benchmark tests, Unit tests and other technical >> knowledge I would require for this project. >> >> Meanwhile, I have started my study for the subject and gaining experience >> with Cython. I am looking forward to guidance from the potential mentors or >> anyone willing to help. >> >> Thank You >> >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
