Hi,
I was surprised to read that class weights are implemented via sampling
for LogisticRegression, is this really the case?
from the LR doc
---
class_weight : {dict, 'auto'}, optional
Over-/undersamples the samples of each class according to the given
weights. If not given,
Got it thanks! One last question:
Are there heuristics or rules of thumb about which distribution should be used
or turn out to be best with gradient boost classifiers (depth of tree, min
number of samples, learning rate, etc..)?
Thank you,
-Original Message-
From: Vlad Niculae
Last time I checked, liblinear didn't support sample weights, just class
weights (one for positive samples and another for negative samples).
Mathieu
On Tue, Apr 21, 2015 at 5:56 AM, iBayer mane.d...@googlemail.com wrote:
Hi,
I was surprised to read that class weights are implemented via
More importantly than the statement from Sturla, which I may or may not
agree with based on the modeling assumption (and every p-value is based
on a modeling assumption), the logistic in scikit-learn is a penalized
logistic model. Thus the closed-form formulas for p-values are not valid.
G
On
Oh, I mean that is a problem of the t-SNE implementation, it is not a
problem of the MNIST implementation. I don't know how that could
happen. :D
On 04/20/2015 08:55 AM, Jason Wolosonovich wrote:
Oh wow, very cool. Thank you very much for the assistance and info Alexander!
-Original
Hi Roberto
what does None do for max_depth?
Copy-pasted from
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
If None, then nodes are expanded until all leaves are pure or until all leaves
contain less than min_samples_split samples.”
In particular,
Hi Vlad,
when using randomized grid search, does sklearn look into intermediate values,
or does it samples from the values provided in the parameter grid?
Thank you,
From: Vlad Niculae [zephy...@gmail.com]
Sent: Monday, April 20, 2015 12:50 PM
To:
On 04/19/2015 08:18 AM, Joel Nothman wrote:
On 17 April 2015 at 13:52, Daniel Vainsencher
daniel.vainsenc...@gmail.com mailto:daniel.vainsenc...@gmail.com wrote:
On 04/16/2015 05:49 PM, Joel Nothman wrote:
I more or less agree. Certainly we only need to do one searchsorted per
From the example in the documentation:
# specify parameters and distributions to sample from
param_dist = {max_depth: [3, None],
max_features: sp_randint(1, 11),
min_samples_split: sp_randint(1, 11),
min_samples_leaf: sp_randint(1, 11),
If you have continuous parameter you should really really really use
continuous distributions!
On 04/20/2015 12:58 PM, Pagliari, Roberto wrote:
Hi Vlad,
when using randomized grid search, does sklearn look into intermediate
values, or does it samples from the values provided in the parameter
Yes, I agree. From the example, though, my understanding is that you can only
pass arrays, not functions, isn't that true?
Thank you,
From: Andreas Mueller [t3k...@gmail.com]
Sent: Monday, April 20, 2015 2:55 PM
To:
The example you cite contains these lines:
max_features: sp_randint(1, 11),
min_samples_split: sp_randint(1, 11),
min_samples_leaf: sp_randint(1, 11),
Those are not lists, but distribution objects from scipy (see at the top of the
example, `from
The User Guide has an example that better illustrates what Andy meant: for
continuous parameters such as C and gamma in a gaussian kernel SVM, you should
use a continuous distribution (e.g. exponential):
Oh wow, very cool. Thank you very much for the assistance and info Alexander!
-Original Message-
From: afabisch [mailto:afabi...@mailhost.informatik.uni-bremen.de]
Sent: Saturday, April 18, 2015 9:15 AM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general]
14 matches
Mail list logo