[Scikit-learn-general] LogisticRegression: sample vs class weights

2015-04-20 Thread iBayer
Hi, I was surprised to read that class weights are implemented via sampling for LogisticRegression, is this really the case? from the LR doc --- class_weight : {dict, 'auto'}, optional Over-/undersamples the samples of each class according to the given weights. If not given,

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
Got it thanks! One last question: Are there heuristics or rules of thumb about which distribution should be used or turn out to be best with gradient boost classifiers (depth of tree, min number of samples, learning rate, etc..)? Thank you, -Original Message- From: Vlad Niculae

Re: [Scikit-learn-general] LogisticRegression: sample vs class weights

2015-04-20 Thread Mathieu Blondel
Last time I checked, liblinear didn't support sample weights, just class weights (one for positive samples and another for negative samples). Mathieu On Tue, Apr 21, 2015 at 5:56 AM, iBayer mane.d...@googlemail.com wrote: Hi, I was surprised to read that class weights are implemented via

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-20 Thread Gael Varoquaux
More importantly than the statement from Sturla, which I may or may not agree with based on the modeling assumption (and every p-value is based on a modeling assumption), the logistic in scikit-learn is a penalized logistic model. Thus the closed-form formulas for p-values are not valid. G On

Re: [Scikit-learn-general] TSNE Memory Error

2015-04-20 Thread Alexander Fabisch
Oh, I mean that is a problem of the t-SNE implementation, it is not a problem of the MNIST implementation. I don't know how that could happen. :D On 04/20/2015 08:55 AM, Jason Wolosonovich wrote: Oh wow, very cool. Thank you very much for the assistance and info Alexander! -Original

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
Hi Roberto what does None do for max_depth? Copy-pasted from http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.” In particular,

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
Hi Vlad, when using randomized grid search, does sklearn look into intermediate values, or does it samples from the values provided in the parameter grid? Thank you, From: Vlad Niculae [zephy...@gmail.com] Sent: Monday, April 20, 2015 12:50 PM To:

Re: [Scikit-learn-general] Performance of LSHForest

2015-04-20 Thread Daniel Vainsencher
On 04/19/2015 08:18 AM, Joel Nothman wrote: On 17 April 2015 at 13:52, Daniel Vainsencher daniel.vainsenc...@gmail.com mailto:daniel.vainsenc...@gmail.com wrote: On 04/16/2015 05:49 PM, Joel Nothman wrote: I more or less agree. Certainly we only need to do one searchsorted per

[Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
From the example in the documentation: # specify parameters and distributions to sample from param_dist = {max_depth: [3, None], max_features: sp_randint(1, 11), min_samples_split: sp_randint(1, 11), min_samples_leaf: sp_randint(1, 11),

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Andreas Mueller
If you have continuous parameter you should really really really use continuous distributions! On 04/20/2015 12:58 PM, Pagliari, Roberto wrote: Hi Vlad, when using randomized grid search, does sklearn look into intermediate values, or does it samples from the values provided in the parameter

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
Yes, I agree. From the example, though, my understanding is that you can only pass arrays, not functions, isn't that true? Thank you, From: Andreas Mueller [t3k...@gmail.com] Sent: Monday, April 20, 2015 2:55 PM To:

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
The example you cite contains these lines: max_features: sp_randint(1, 11), min_samples_split: sp_randint(1, 11), min_samples_leaf: sp_randint(1, 11), Those are not lists, but distribution objects from scipy (see at the top of the example, `from

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
The User Guide has an example that better illustrates what Andy meant: for continuous parameters such as C and gamma in a gaussian kernel SVM, you should use a continuous distribution (e.g. exponential):

Re: [Scikit-learn-general] TSNE Memory Error

2015-04-20 Thread Jason Wolosonovich
Oh wow, very cool. Thank you very much for the assistance and info Alexander! -Original Message- From: afabisch [mailto:afabi...@mailhost.informatik.uni-bremen.de] Sent: Saturday, April 18, 2015 9:15 AM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general]