Hello,
a naive question about what I should do and what already exists in scikit-learn.
I have a classification problem with two classes, and I know that one
of my features has two different different distributions for one of
the classes.
Example made up on the spot (real life is more complicate
Hi Gilles,
On 23 May 2014 15:06, Gilles Louppe wrote:
> Hi Tim,
>
> In principles, what you describe exactly corresponds to the decision tree
> algorithm. You partition the input space into smaller subspaces, on which
> you recursively build sub-decision trees.
>
Exactly. What I was wondering wa
Hello,
I was comparing scores from CV with a score obtained from training on a
subset of the data used in the CV and get very different answers. This
surprised me, should I be? If not how do I understand how/why this happens?
I run:
scores = cross_validation.cross_val_score(clf, X_dev, y_dev,
sc
Hi Gilles,
On Thu Feb 19 2015 at 8:35:35 AM Gilles Louppe wrote:
> Hi Tim,
>
> By default, cross_val_score uses on StratifiedKFold(shuffle=False) to
> create the train/test folds while train_test_split uses ShuffleSplit.
> The discrepancy you observe might therefore come from either
> shuffling,
Hi,
On Thu Feb 19 2015 at 10:58:26 PM Andy wrote:
> You give the roc_auc_score the result of "predict". You should give it
> the result of "predict_proba".
>
>
Yes! Thought for me roc_auc_score complains if I pass the result of
predict_proba (wrong shape) but the output of decision_function() w
Hi Andreas,
On Sun, Feb 3, 2013 at 6:47 PM, Andreas Mueller
wrote:
>
> On 02/03/2013 06:34 PM, Ronnie Ghose wrote:
>
> just wondering... what do the % signs mean iirc they should sum to 100
> right? in this case the top sums to 38862 ?
>
> Sorry the html formating is not so great.
> Ther