Re: [Scikit-learn-general] manifold alignment functionality in scikit learn?

2014-09-16 Thread Satrajit Ghosh
ps. a quick update, the notebook now matches the code in the branch. On Tue, Sep 16, 2014 at 9:54 PM, wrote: > satra, > > thanks so much for pointing me to this. much appreciated! > > best, > kc > > > hi kc, > > > > it's not in scikit learn but we use these quite routinely alongside > > scikit-l

Re: [Scikit-learn-general] manifold alignment functionality in scikit learn?

2014-09-16 Thread kevnull
satra, thanks so much for pointing me to this. much appreciated! best, kc > hi kc, > > it's not in scikit learn but we use these quite routinely alongside > scikit-learn. > > https://github.com/scikit-learn/scikit-learn/pull/2730 > > here is also a set a notebook showing manifold extraction usin

Re: [Scikit-learn-general] manifold alignment functionality in scikit learn?

2014-09-16 Thread Satrajit Ghosh
hi kc, it's not in scikit learn but we use these quite routinely alongside scikit-learn. https://github.com/scikit-learn/scikit-learn/pull/2730 here is also a set a notebook showing manifold extraction using diffusion embedding. the notebook is a little out of date with respect to the code. it a

[Scikit-learn-general] manifold alignment functionality in scikit learn?

2014-09-16 Thread kevnull
Has anyone worked on the problem of manifold alignment? http://en.wikipedia.org/wiki/Manifold_alignment as described in papers like: "Manifold Alignment without Correspondence" http://ijcai.org/papers09/Papers/IJCAI09-214.pdf or "Data Fusion and Multi-Cue Data Matching by Diffusion Maps" http:

Re: [Scikit-learn-general] Manual or automatic curation of references in this mailing list

2014-09-16 Thread Olivier Grisel
Sounds like an interesting idea. It could be done by editing a new wiki page on github: https://github.com/scikit-learn/scikit-learn/wiki -- Olivier -- Want excitement? Manually upgrade your production database. When yo

Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

2014-09-16 Thread c TAKES
yes - In fact my real goal is to implement RGF ultimately, though I had considered building/forking off the current GradientBoostingRegressor package as a starting point A) b/c I'm new to developing for scikit-learn and B) to maintain as much consistency as possible with the rest of the package. U

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
Of you set the random state and put the same parameters, you are expected to have exactly the same model. To be concrete, if you do est_1 = GradientBoostingClassifie(random_state=0) est.fit(X, y) est_2 = GradientBoostingClassifie(random_state=0) est.fit(X, y) est_3 = GradientBoostingClassifie(r

[Scikit-learn-general] config file management and machine learning?

2014-09-16 Thread Patrick Short
Hi all, Wanted to see if anyone had any resources for best practices in setting up config files in a machine learning context. I am working on an ensemble classifier and trying to figure out the best way to organize training/saving/loading feature-level classifiers and the ensemble classifier via

Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

2014-09-16 Thread Peter Prettenhofer
The only reference I know is the Regularized Greedy Forest paper by Johnson and Zhang [1] I havent read the primary source (by Zhang as well). [1] http://arxiv.org/abs/1109.0887 2014-09-16 15:15 GMT+02:00 Mathieu Blondel : > Could you give a reference for gradient boosting with fully corrective

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Debanjan Bhattacharyya
Thanks Arnaud Got it. Essentially what you are saying is While training classifier A, imagine there was a tie at estimator 3, on 2 features sets, e.g S1[12,3,4,5,6] and S2[2,3,4,5,6,7]. And S1 was chosen While training classifier B, there was a tie again at estimator 3 on the same sets and S2 was

Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

2014-09-16 Thread Mathieu Blondel
Could you give a reference for gradient boosting with fully corrective updates? Since the philosophy of gradient boosting is to fit each tree against the residuals (or negative gradient) so far, I am wondering how such fully corrective update would work... Mathieu On Tue, Sep 16, 2014 at 9:16 AM

Re: [Scikit-learn-general] Multi-target regression

2014-09-16 Thread Anders Aagaard
And after many years of using them both I still get the two confused... Sorry about the noise! ;) On Tue, Sep 16, 2014 at 12:47 PM, Gael Varoquaux < [email protected]> wrote: > On Tue, Sep 16, 2014 at 12:43:49PM +0200, Anders Aagaard wrote: > > I just had a look at this, and the docum

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
During the growth of the decision tree, the best split is searched in a subset of max_features sampled among all features. Setting the random_state allows to draw the same subsets of features each time. Note that if several candidate splits have the same score, ties are broken randomly. Setting t

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Debanjan Bhattacharyya
Agree Gilles Which is why I later changed to max_features = None, but 6 is a good value, sqrt(36) ~=sqrt(30) and we had 30 features. Generally speaking, if I have 100 estimators (this is from previous experience and also the auto setting on your GBC) and 30 features, 6 should be a good start. But

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Gilles Louppe
Hi Deb, In your case, randomness comes from the max_features=6 setting, which makes the model not very stable from one execution to another, since the original dataset includes about 5x more input variables. Gilles On 16 September 2014 12:40, Debanjan Bhattacharyya wrote: > Thanks Arnaud > > ra

Re: [Scikit-learn-general] Multi-target regression

2014-09-16 Thread Gael Varoquaux
On Tue, Sep 16, 2014 at 12:43:49PM +0200, Anders Aagaard wrote: > I just had a look at this, and the documentation on http://scikit-learn.org/ > stable/modules/generated/sklearn.linear_model.LogisticRegression.html states y > should be "y : array-like, shape = [n_samples]", That's a logistic regre

Re: [Scikit-learn-general] Multi-target regression

2014-09-16 Thread Anders Aagaard
I just had a look at this, and the documentation on http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html states y should be "y : array-like, shape = [n_samples]", did I miss something? I also tried doing it real quick, and it immediately complained on the in

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Debanjan Bhattacharyya
Thanks Arnaud random_state is not listed as a parameter on http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html page. But it is listed as an argument in the constructor. Its my fault probably - that I did not notice it as a passable parameter. May be th

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
Hi, To get reproducible model, you have to set the random_state. Best regards, Arnaud On 16 Sep 2014, at 12:08, Debanjan Bhattacharyya wrote: > Hi I recently participated in the Atlas (Higgs Boson Machine Learning > Challenge) > > One of the models I tried was GradientBoostingClassifier. I

[Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Debanjan Bhattacharyya
Hi I recently participated in the Atlas (Higgs Boson Machine Learning Challenge) One of the models I tried was GradientBoostingClassifier. I found it extremely non deterministic. So if I use est = GradientBoostingClassifier(n_estimators=100, max_depth=10,min_samples_leaf=20,max_features=6,verbose

Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

2014-09-16 Thread Arnaud Joly
Hi, There is a very advanced pull request which add sparse matrix support to decision tree: https://github.com/scikit-learn/scikit-learn/pull/3173 Based on this, it could be possible to have gradient tree boosting working on sparse data. Note that adaboost already support sparse matrix with non-

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-16 Thread Arnaud Joly
I would add to this lists: - check_array; - check_consistent_length; - check_X_y. Those are very useful. Arnaud On 15 Sep 2014, at 20:03, Olivier Grisel wrote: > 2014-09-15 6:40 GMT-07:00 Mathieu Blondel : >> lightning is using the following utils: >> >> - check_random_st