Unfortunately, the most important parameters to adjust to maximize
accuracy are often those controlling the randomness in the algorithm,
i.e. max_features for which this strategy is not possible.
That being said, in the case of boosting, I think this strategy would
be worth automatizing, e.g. to
Hi Eskil,
(CC: the scikit-learn mailing list)
Unfortunately, I would not have time myself to implement this new
criterion. In any case, given the recent publication of this paper, I
dont think we would add it to the scikit-learn codebase. Our policy is
to only include time-tested algorithms.
Hi,
Before going further, what version of scikit-learn are you using? We
did a major update of the GP code in 0.18-dev.
Best,
Gilles
On 6 January 2016 at 05:01, Zafer Leylek
wrote:
> Just going over the scikit GaussianProcess code and comparing the results
>
Hi Jeff,
In general, most implementations of predict_proba are some proxy the
conditional probability p(y|x). Some of them really are modelling this
quantity quite well (e.g., gaussian process) while for some others it
is closer to a heuristic than to the actual p(y|x) (e.g., with linear
models).
Hi Sebastian,
Yes. This is intentional. The motivation comes from
http://link.springer.com/article/10.1007/s10994-006-6226-1#/page-1
where it is shown experimentally that is a good default value on
average.
Gilles
On 13 November 2015 at 11:17, Sebastian Raschka wrote:
>
Congratulations! I wish I could be there next week to offer you beers :(
Gilles
On 17 October 2015 at 18:51, Gael Varoquaux
wrote:
> Thanks a lot to the team that pulled out this beta release. I know that
> it was a lot of work with a huge amount of bug fixing.
Welcome to both of you Tom and Jan!
On 23 September 2015 at 07:45, Jan Hendrik Metzen
wrote:
> Hi everyone,
> thanks a lot; I am glad to be part of such a great team and looking
> forward to continue to work with you guys!
>
> Cheers,
> Jan
>
> On 22.09.2015 19:16,
Hi Olivier,
It seems the 3 PRs you mentioned are now closed/merged. Are there
other blocking PRs you need us to look at before freezing for the
release?
Cheers,
Gilles
On 4 September 2015 at 12:16, Olivier Grisel wrote:
> Hi all,
>
> It's been a while since we have
Hi,
> But the question is how to make the scikit-learn code, decisionTree Regressor
> for example, running in distributed computing mode, to benefit the power of
> Spark?
I am sorry but you cant. The tree implementation in scikit-learn was
not designed for this use case.
Maybe you should have
Hi Rex,
This is currently not supported in scikit-learn.
Gilles
On 12 September 2015 at 05:02, Rex X wrote:
> Given categorical attributes, for instance
> city = ['a', 'b', 'c', 'd', 'e', 'f']
>
> With DictVectorizer(), we can transform "city" into a sparse matrix, using
>
Here is a sample code on how to retrieve the nodes traversed by a given sample:
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target
clf = DecisionTreeClassifier().fit(X, y)
def path(tree, sample):
nodes =
Also, have a look at the documentation here
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L3205
to understand the structure of the tree_ object.
On 31 August 2015 at 08:55, Gilles Louppe <g.lou...@gmail.com> wrote:
> Here is a sample code on how to
Hi,
The simplest method to get you are looking for is to re-propagate the
training samples into the tree and keep track of the nodes they
traverse. You should have a look at the implementation of `apply` to
get started.
Hope this helps,
Gilles
On 30 August 2015 at 21:55, Rex X dnsr...@gmail.com
(Also, this can be done in Python code, by using the interface we
provide for the tree_ object)
On 30 August 2015 at 22:22, Gilles Louppe g.lou...@gmail.com wrote:
Hi,
The simplest method to get you are looking for is to re-propagate the
training samples into the tree and keep track
Hi Sebastian,
Indeed, N samples are drawn with replacement, where N=len(original
training set). I guess we could add an extra max_samples parameter,
just like we have for the Bagging estimators.
Gilles
On 6 July 2015 at 23:00, Sebastian Raschka se.rasc...@gmail.com wrote:
Thanks, Jeff, that
Hi Sebastian,
Both terminology are in fact strictly equivalent for regression. See
e.g. page 46 of http://arxiv.org/abs/1407.7502
Best,
Gilles
On 9 July 2015 at 18:56, Sebastian Raschka se.rasc...@gmail.com wrote:
Hi, all,
sorry, but I have another question regarding the terminology in the
Hi,
Since the last version, scikit-learn provides an `apply` method for
the classifier itself, hence preventing users from shooting themselves
in the foot :)
So basically, you can replace clf.tree_.apply(X_train) with
clf..apply(X_train) and it should work.
Hope this helps,
Gilles
On 23 May
Hi Trevor,
I am only speaking for myself, not on behalf of the scikit-learn project,
but I would be +1 for your project and use of the -learn suffix. The pros
you cite are in my opinion more important than the cons.
Cheers,
Gilles
On 28 April 2015 at 05:33, Trevor Stephens
Hi Luca,
If you want to find all relevant features, I would recommend using
ExtraTreesClassifier with max_features=1 and limited depth in order to
avoid this kind of bias due to estimation errors. E.g., try with
max_depth=3 to 5 or using max_leaf_nodes.
Hope this helps,
Gilles
On 19 April
Hi,
In general, I agree that we should at least add a way to compute feature
importances using permutations. This is an alternative, yet standard, way
to do it in comparison to what we do (mean decrease of impurity, which is
also standard).
Assuming we provide permutation importances as a
Congratulations to everyone involved! Kudos to Andy, Olivier and Joel
for their continuous work these last months :)
On 27 March 2015 at 19:01, Alexandre Gramfort
alexandre.gramf...@telecom-paristech.fr wrote:
:beers: !
A
Hi Luca,
On 6 March 2015 at 11:09, Luca Puggini lucapug...@gmail.com wrote:
Hi,
It seems to me that you are discussing topics that can be introduced in
sklearn with GSoC.
I use sklearn quiet a lot and there are a couple of things that I really
miss in this library:
1- Nipals PCA.
The
Yes, in fact I did something similar in my thesis. See section 7.2 for
a discussion about this. Figure 7.5 is similar to what you describe in
your sample code. By varying the depth, you can basically control the
bias.
http://orbi.ulg.ac.be/bitstream/2268/170309/1/thesis.pdf
On 6 March 2015 at
Hi Pierre,
While the name is different, the MSE criterion is strictly equivalent
to the reduction of variance. The only difference is that we do not
divide by var{y|S} because this factor is the same for all splits and
all features, hence the maximizer is the same.
Cheers,
Gilles
On 24
Thanks for the report. I can indeed reproduce the issue -- _tree.pyx
does no longer compile with Cython 0.22.
(The current _tree.c code was compiled with an older version of Cython.)
On 23 February 2015 at 07:02, Zay Maung Maung Aye zmm...@gmail.com wrote:
Hi Everyone,
I downloaded the
On 11 February 2015 at 22:22, Timothy Vivian-Griffiths
vivian-griffith...@cardiff.ac.uk wrote:
Hi Gilles,
Thank you so much for clearing this up for me. So, am I right in thinking
that the feature selection is carried for every CV-fold, and then once the
best parameters have been found, the
Hi Tim,
On 9 February 2015 at 19:54, Timothy Vivian-Griffiths
vivian-griffith...@cardiff.ac.uk wrote:
Just a quick follow up to some of the previous problems that I have had:
after getting some kind assistance at the PyData London meetup last week, I
found out why I was getting different
Hi Miquel,
These options are not available within
RandomForestClassifier/Regressor. By default len(X) are drawn with
replacement.
However, you can achieve what you look for using
BaggingClassifier(base_estimator=DecisionTreeClassifier(...),
max_samples=..., max_features=...), where max_samples
Hi,
I confirm what has been said before. Samples are not stored anywhere
in the leafs -- only the final prediction along with some statistics.
To do what you want, you have to recompute the distribution yourself,
eg using apply and then grouping by leaf ids.
Gilles
On 15 October 2014 02:25,
Hi Deb,
In your case, randomness comes from the max_features=6 setting, which
makes the model not very stable from one execution to another, since
the original dataset includes about 5x more input variables.
Gilles
On 16 September 2014 12:40, Debanjan Bhattacharyya b.deban...@gmail.com wrote:
Hi Luca,
The best strategy consists in finding the best threshold, that is the one
that maximizes impurity decrease, when trying to partition a node into a
left and right nodes. By contrast, random does not look for the best
split and simply draw the discretization threshold at random.
For
Yes, exactly.
Le 12 sept. 2014 18:31, Luca Puggini lucapug...@gmail.com a écrit :
Hey thanks a lot,
so basically in random Forest the split is done like in the algorithm
described in your thesis except that the search is not done on all the
variables but only on a random subset of them?
Hi Luca,
This may not be the fastest implementation, but random forest
proximities can be computed quite straightforwardly in Python given
our 'apply' function.
See for instance
https://github.com/glouppe/phd-thesis/blob/master/scripts/ch4_proximity.py#L12
From a personal point of view, I never
I am rather -1 on making this a transform. There has many ways to come
up with proximity measures in forest -- In fact, I dont think
Breiman's is particularly well designed.
On 8 September 2014 16:52, Gael Varoquaux gael.varoqu...@normalesup.org wrote:
On Mon, Sep 08, 2014 at 11:49:26PM +0900,
of the two samples.
On 8 September 2014 17:03, Mathieu Blondel math...@mblondel.org wrote:
On Mon, Sep 8, 2014 at 11:55 PM, Gilles Louppe g.lou...@gmail.com wrote:
I am rather -1 on making this a transform. There has many ways to come
up with proximity measures in forest -- In fact, I dont
Hi Kevin,
Interesting question. Your point is true provided you have an infinite
amount of training data. In that case, you can indeed show that an
infinitely large forest of extremely randomized trees built for K=1
converges towards an optimal model (the Bayes model).
This result however does
Hi Pranav,
You should increase the number of trees. By default, it is set to 10,
which would explain why you don't reach higher precision.
Best,
Gilles
On 3 June 2014 07:32, Pranav O. Sharma emailpra...@gmail.com wrote:
Hi,
I'm trying to use
]
[ 0.4 0.6 ]
[ 0.41 0.59]
[ 0.65 0.35]
[ 0.52 0.48]
[ 0.42 0.58]
[ 0.49 0.51]
[ 0.19 0.81]
[ 0.71 0.29]
[ 0.24 0.76]]
On Mon, Jun 2, 2014 at 11:08 PM, Gilles Louppe g.lou...@gmail.com wrote:
Hi Pranav,
You should increase the number of trees. By default, it is set
Why do you want to put a random forest in a numpy array in the first place?
Best,
Gilles
On 26 May 2014 13:11, Lars Buitinck larsm...@gmail.com wrote:
2014-05-24 0:28 GMT+02:00 Steven Kearnes skear...@gmail.com:
a is a list of the individual DecisionTreeClassifier objects belonging to
the
Hi Lars,
Thanks! Oh, I would be interested in seeing them. Could send me the link if
you still have them?
Thanks,
Gilles
On 23 May 2014 11:05, Lars Buitinck larsm...@gmail.com wrote:
2014-05-22 8:13 GMT+02:00 Gilles Louppe g.lou...@gmail.com:
Just for letting you know, my talk Accelerating
Thanks! This is really cool! I think I'll try to reproduce some of them and
put one or two in my slides.
On 23 May 2014 11:29, Lars Buitinck larsm...@gmail.com wrote:
2014-05-23 11:08 GMT+02:00 Gilles Louppe g.lou...@gmail.com:
Thanks! Oh, I would be interested in seeing them. Could send me
Hi Tim,
In principles, what you describe exactly corresponds to the decision tree
algorithm. You partition the input space into smaller subspaces, on which
you recursively build sub-decision trees.
In practice however, I would not split things by hand, unless you are
interested in discovering
Hi folks,
Just for letting you know, my talk Accelerating Random Forests in
Scikit-Learn was approved for EuroScipy'14. Details can be found at
https://www.euroscipy.org/2014/schedule/presentation/9/.
My slides are far from being ready, but my intention is to present our
team efforts on the tree
Hi,
Can you try on 0.15-dev to see it solves your issues? We have changed
the backend for parallelizing trees.
Gilles
On 18 April 2014 23:13, Zygmunt Zając zajac.zygm...@gmail.com wrote:
Hi,
When I train a random forest, I'd like it to use all the cores. I set
n_jobs = -1, but it doesn't
Hi Satra,
In case of Extra-Trees, changing the scale of features might change
the result when the transform you apply distorts the original feature
space. Drawing a threshold uniformly at random in the original
[min;max] interval won't be equivalent to drawing a threshold in
[f(min);f(max)] if f
On 12 March 2014 13:08, Felipe Eltermann felipe.elterm...@gmail.com wrote:
Hello Vamsi,
Firstly, regarding the implementation of sparse functions. _tree.pxy is the
back end cython code to handle the operations Splitting, Evaluating
impurities at nodes and then constructing the tree.
That's
Dear Vincent,
On 6 February 2014 17:46, Vincent Arel vincent.a...@gmail.com wrote:
Hi all,
Gilles Louppe[1] suggests that feature importance in random forest
classifiers is calculated using the algorithm of Breiman (1984). I
imagine this is the same as formula 10.42 on page 368 of Hastie et
Vincent,
I identified the bug and opened an issue at
https://github.com/scikit-learn/scikit-learn/issues/2835
I will try to fix this in the next days.
Sorry for the inconvenience.
Gilles
On 6 February 2014 18:18, Gilles Louppe g.lou...@gmail.com wrote:
Dear Vincent,
On 6 February 2014 17
Hi Pablo,
I am not sure re-implementing a new criterion is what you are looking
for. Criteria are made to evaluate the goodness of a split (i.e., a
binary partition of the samples in the current node) in terms of
impurity with regards to the output variable - not the inputs.
What you should do
originally designed to handle categorical
variables properly...)
Cheers,
Pablo
On 29 January 2014 20:30, Gilles Louppe g.lou...@gmail.com wrote:
Hi Pablo,
I am not sure re-implementing a new criterion is what you are looking
for. Criteria are made to evaluate the goodness of a split (i.e
Given our intent to release 1.0 in the next future, I think we should
also make it clear in the wiki page that adding more and more
algorithms is not exactly the direction in which we are going to.
Maybe this is the opportunity to remove some of the old subjects from
2013 and instead add topics
How much code in our current implementation depends on the data
representation?
Not much actually. It now basically boils down to simply write a new
splitter object. Everything else remains the same. So basically, I would
say that it amounts to 300~ lines of Cython (out of the 2300 lines in our
Mathieu,
I have no experience with forests on sparse data, nor have I seen much work
on the topic. I would be curious to investigate however, there may be
problems how which this is useful. I know that Arnaud tried forests on
(densified) 20newsgroups and it seems to work well actually.
In
By the way, if any of you would like to recycle this poster, sources are
available at
https://github.com/glouppe/talk-sklearn-mloss-nips2013/tree/master/poster
On 16 January 2014 16:41, Arnaud Joly a.j...@ulg.ac.be wrote:
Hi everyone,
There is a local event at my university which is called
Dear Caleb,
The current implementation does not allow for that. You can do as suggested
by Lars though, if this is practical for you.
Gilles
On 12 January 2014 16:03, Caleb cloverev...@yahoo.com wrote:
Hi all,
In the current implementation of the decision tree, data is split according
Dear Adolfo,
You could instead use pickle, which will create a single file.
Best,
Gilles
On 7 January 2014 16:49, Adolfo Martinez amarti...@intelimetrica.comwrote:
Hello, I have a trained ExtraTreesRegressor saved using joblib.dump
(without compress). This creates more than ten thousand
Hi Thomas,
Indeed, gini and entropy are the only supported impurity criteria for
classification. I don't think we have plans right now to add others - which
one do you have in mind?
how feasible would it be to have the option of passing custom function to
the tree or forest to use in splitting?
Hi,
Thanks for pointing this Andy! I think it would help indeed to set some
coarse deadline for the next release. This would help us get motion and get
things done. End of december or beginning of January would be best for me.
On my side, I don't plan to contribute anything big in the meantime.
Hi Nigel,
What is the proportion of English versus non-English tweets in your data?
It may be the case that your dataset is unbalanced.
Gilles
On 18 October 2013 09:32, Nigel Legg nigel.l...@gmail.com wrote:
I have a set of tweets, and I am trying to use an SVM classifier to class
them as
The branch is now deleted ;)
On 17 October 2013 06:35, Robert Layton robertlay...@gmail.com wrote:
I know, I'm very sorry.
I made a new branch directly from upstream/master, then pushed without
checking where that push was going to.
Can someone please delete this branch for me? I dare not
Hi Robert,
Unfortunately, algorithms for recommender systems are not planned in
scikit-learn in the short or mid-term.
I would advise you to look at other libraries that are specifically
targeting that problem. In particular, GraphLab (http://graphlab.org/) is
among the best libraries for
on that file.
Jake
On Wed, Sep 25, 2013 at 5:19 AM, Gilles Louppe g.lou...@gmail.com wrote:
Hi,
I have just put together a quick and dirty script that does that. It
extracts the number of commits for all developers, for all files on a
git directory. It then computes the 3 nearest
On 25 September 2013 19:05, Andreas Mueller amuel...@ais.uni-bonn.de wrote:
On 09/25/2013 06:44 PM, Olivier Grisel wrote:
2013/9/25 Andreas Mueller amuel...@ais.uni-bonn.de:
On 09/25/2013 04:15 PM, Jacob Vanderplas wrote:
Very cool!
One quick comment: I'd probably normalize the values in the
...@gmail.com wrote:
Congrats to both of you - enjoy the skiing (boy, I'm jealous)!
2013/9/5 Gilles Louppe g.lou...@gmail.com
Congratulations Gael! Ours is also officially accepted, so you can count
on me.
Gilles
On Thursday, 5 September 2013, Gael Varoquaux
gael.varoqu...@normalesup.org
Dear Yegle,
1) What does your data represent? Are your features numbers or concepts?
In the first case, you should try to build your estimator without
encoding anything. In the second case, it might also not be necessary
to one-hot encode your categorical features. Try with and without
encoding
for further
promotion of the project.
Gilles
On 22 August 2013 16:26, Gilles Louppe g.lou...@gmail.com wrote:
Hi,
It is more than likely that I will be there this year - given the
reviews of our paper, I would be surprised if it was rejected.
What sort of talk would you have in mind Nelle?
Gilles
Hi,
As Roland says, this is a Numpy question rather than a scikit-learn
question. If you want to ignore specific fields then it indeed amounts to
removing the corresponding columns in your X array before feeding it to
your estimator.
(Note however that Random Forests have the advantages of being
Hi,
Please be more specific. What are the error messages?
Best,
Gilles
On 13 August 2013 14:14, MORGANDON G doh...@mac.com wrote:
Can someone direct me to the correct place to find help with an
installation problem I have on the Mac?
I used MacPorts and it said everything went just
wrote:
command not found
On Aug 13, 2013, at 8:21 PM, Gilles Louppe g.lou...@gmail.com wrote:
Hi,
Please be more specific. What are the error messages?
Best,
Gilles
On 13 August 2013 14:14, MORGANDON G doh...@mac.com wrote:
Can someone direct me to the correct place to find help
'
is this good news?
Don
On Aug 13, 2013, at 8:38 PM, Gilles Louppe g.lou...@gmail.com wrote:
What command are you typing?
To use scikit-learn, if you have either use a Python shell (ie, using the
python command in a terminal) or execute a Python script (using python
script.py). Are you familiar
Hi,
I'm well aware I can pickle it, but I would like to avoid having to write
2 files - otherwise I would just write the classes to a text file.
You pickle several Python objects using the same file handler.
Gilles
Lars,
Well, I'm confused now, sklearn.__version__ says 0.14-git. Did I
- discuss the with the tree growers guys on how to best parallelize
random forest trainings on multi-core without copying the training set
in memory
- either with threads in joblib and with nogil statements in the
inner loops of the (new) cython code
- either with shared memory and the
Hi Theofilos,
That would be great! I think it could easily be done by adding new
Criterion classes into the _tree.pyx file.
Note however that we are currently refactoring the core tree module.
It may be best to wait for it to merged for you to start coding -
otherwise you may end up with lots of
Hi,
Such ensembles are not implemented at the moment.
Gilles
On 21 June 2013 09:59, Maheshakya Wijewardena pmaheshak...@gmail.com wrote:
Hi all,
I would like to know whether we have bootstrap aggregating functionality in
scikit-learn library. If so, How do I use that?
(If it doesn't exist
Hi,
This looks like the dataset from the Amazon challenge currently
running on Kaggle. When one-hot-encoded, you end up with rhoughly
15000 binary features, which means that the dense representation
requires at least 32000*15000*4 bytes to hold in memory (or even twice
as as more depending on
On 3 June 2013 08:43, Andreas Mueller amuel...@ais.uni-bonn.de wrote:
On 06/03/2013 05:19 AM, Joel Nothman wrote:
However, in these last two cases, the number of possible splits at a
single node is linear in the number of categories. Selecting an
arbitrary partition allows exponentially many
Hi,
The main question is, what is your definition of an important variable?
Gilles
On 1 June 2013 14:22, o m oda...@gmail.com wrote:
I've been playing around with Lasso and Lars, but there's something that
bothers me about standardization.
If I don't standardize to N(0, 1), these procedures
Hi Ken,
I share and understand your concerns about the rigidity of the current
implementation.
I like using Extremely Randomized Trees, but I'm looking for more flexibility
in generating them. In particular, I'd like to be able to specify my own
criterion and split finding algorithm. I'm
Thanks for solving the Travis bug :)
On 1 May 2013 21:15, Gael Varoquaux gael.varoqu...@normalesup.org wrote:
On Wed, May 01, 2013 at 06:19:34PM +0200, Olivier Grisel wrote:
I spend a couple of hours fixing the build infrastructures:
Wow! Thank you so much. These are well-spent hours.
G
Hi Youssef,
Regarding memory usage, you should know that it'll basically blow up if you
increase the number of jobs. With the current implementation, you'll need
O(n_jobs * |X| * 2) in memory space (where |X| is the size of X, in bytes).
That issue stems from the use of joblib which basically
Congratulations are in order :-)
On 17 April 2013 08:06, Peter Prettenhofer peter.prettenho...@gmail.comwrote:
That's great - congratulations Olivier!
Definitely, no pressure ;-)
2013/4/17 Ronnie Ghose ronnie.gh...@gmail.com
wow :O congrats
On Tue, Apr 16, 2013 at 7:17 PM, Mathieu
Hi Olivier,
There are indeed several ways to get feature importances. As often, there
is no strict consensus about what this word means.
In our case, we implement the importance as described in [1] (often cited,
but unfortunately rarely read...). It is sometimes called gini importance
or mean
Note that you can get perfect scores (either 0.0 or 1.0) simply be
setting n_estimators=1. This is why you should use this measure with
caution.
On 20 March 2013 15:27, Lars Buitinck l.j.buiti...@uva.nl wrote:
2013/3/20 paul.czodrow...@merckgroup.com
I was just about to say that discarding
Hi,
Short answer: you cant.
Longer answer: If you use as training samples the whole images (with faces
somewhere in there), then your model is learning to discriminate between
your 2 categories, from the whole images, with **no** information about
where the faces are actually located. As such,
I feel like the About us section on the homepage shouldn't be there.
I'd rather put a About link somewhere else than putting this in
front on the home page. Also, I would use the space that we now have
on the front page to highlight more important aspects of the package.
On 5 March 2013 14:46,
Hi David,
I think you should have a look at sklearn.tree.export_graphviz. It
will generate a picture of the tree for you.
- Reference:
http://scikit-learn.org/dev/modules/generated/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz
- Example:
Hello,
You might achieve what you want by using sample weights when fitting
your forest (See the 'sample_weight' parameter). There is also a
'balance_weights' method from the preprocessing module that basically
generates sample weights for you, such that classes become balanced.
Hi David,
What is a SLFN?
Do you have any pointer to a reference paper?
Best,
Gilles
On 5 February 2013 00:51, David Lambert caliband...@gmail.com wrote:
Hi,
I'm new to the list so please forgive my trespasses...
I've nearly completed an implementation of the Extreme Learning Machine
Hi Andy,
Do we really need to take it now? It still gets new answers everyday (28
new answers since your first post 3 days ago). It doesn't hurt to let it
online for a week or so, does it?
Just my 2 cents.
--
Everyone
I don't know about Lube and Oil, but we have some Filters in the
feature_selection package.
HTH,
G
On 30 January 2013 16:04, Andreas Mueller amuel...@ais.uni-bonn.de wrote:
On 01/30/2013 03:59 PM, Brian Holt wrote:
Is it any one of these?
It might be Local Outlier Factor, as we already
Great job to all of you :)
Gilles
On 22 January 2013 07:57, Peter Prettenhofer
peter.prettenho...@gmail.com wrote:
Great work guys - especially Andy - thanks a lot for making this happen!
best,
Peter
2013/1/22 Gael Varoquaux gael.varoqu...@normalesup.org:
On Tue, Jan 22, 2013 at
Just to let you know, it is basically useless to grid-search over the
n_estimators parameter in your forests. The higher, the better.
However, you might try to tune min_sample_split (from 1 to
n_features). It is one of the few parameters that will actually lead
to any improvement in terms of
Hi,
Can you give use the full script that is used to load your model?
In that script, have you imported my_analyzer?
Best,
Gilles
On 15 January 2013 13:51, JAGANADH G jagana...@gmail.com wrote:
Hi All,
I was trying to save and load a model (Text Classificaion with SVM) using
joblib. In the
... or more simply:
pipeline.fit(X, y, nb__sample_weight=sample_weight)
On 10 January 2013 15:20, Gilles Louppe g.lou...@gmail.com wrote:
Hi,
I don't know how it interfaces with NLTK's SklearnClassifier, but if
you can work your way using only Scikit-Learn for training, then can
you pass
Hi David,
On 9 January 2013 02:14, David Broyles sj.clim...@gmail.com wrote:
Hi,
I'm pretty new to scikit-learn. I've generated a random forest
(classification) of 100 trees using default attributes. My data set has
over 2M examples.
2 questions:
1) I've noticed the size of the pickled
Hi Andreas!
... and Merry Christmas to all!
Quick and naive question: what is the point in cross-validating the
number of trees in RandomForest (or in Extra-Trees)? The rule simple
is simple: the more, the better.
Gilles
On 25 December 2012 13:07, Andreas Mueller amuel...@ais.uni-bonn.de
.
Thanks,
Gilles
On Tuesday, 25 December 2012, Gilles Louppe g.lou...@gmail.com wrote:
Hi Andreas!
... and Merry Christmas to all!
Quick and naive question: what is the point in cross-validating the
number of trees in RandomForest (or in Extra-Trees)? The rule simple
is simple: the more
Hi,
Yes, since decision trees handle multi-output problems, classes_[i] is
an array containing the classes for the i-th output. Hence classes_[0]
is the array you are looking for when `y` is 1D.
I guess we could transform classes_ directly into that array if the
decision tree is trained on a
we could have a method
``supports_multi_output`` that returns a boolean so we know what shape the
classes_ are given some arbitrary clf? Or just introspect it?
Doug
On Thu, Nov 29, 2012 at 9:57 AM, Gilles Louppe g.lou...@gmail.com wrote:
Hi,
Yes, since decision trees handle multi-output
`i` is the output index, corresponding to the i-th column of y.
On 29 November 2012 22:00, Lars Buitinck l.j.buiti...@uva.nl wrote:
2012/11/29 Gilles Louppe g.lou...@gmail.com:
Yes, since decision trees handle multi-output problems, classes_[i] is
an array containing the classes for the i-th
1 - 100 of 144 matches
Mail list logo