Re: [Scikit-learn-general] Cross validation with a pre-computed kernel

2015-01-06 Thread Andy
On 01/06/2015 01:21 PM, Morgan Hoffman wrote: Hi Andy, Thanks for your help. Is there something in the scikit-learn documentation (or any other resource) that explains why the kernel matrix at test time needs to be the kernel between the test data and the training data? I am quite new to mach

Re: [Scikit-learn-general] Cross validation with a pre-computed kernel

2015-01-06 Thread Morgan Hoffman
0.7 is really a 0. Thanks! Date: Tue, 6 Jan 2015 12:45:06 -0500 From: [email protected] To: [email protected] Subject: Re: [Scikit-learn-general] Cross validation with a pre-computed kernel The kernel matrix at test time needs to be the kernel

Re: [Scikit-learn-general] Cross validation with a pre-computed kernel

2015-01-06 Thread Andy
I am a bit confused as to why you code doesn't crash on the call to the scaler. What is the shape of train_gram_matrix and test_gram_matrix? On 01/06/2015 12:27 PM, Morgan Hoffman wrote: Hi, I am trying to do a k-fold cross validation with a precomputed kernel. However, I end up with an erro

Re: [Scikit-learn-general] Cross validation with a pre-computed kernel

2015-01-06 Thread Andy
The kernel matrix at test time needs to be the kernel between the test data and the training data. Which I guess is not what get_gram_matrix does. Why are you applying the MinMaxScaler to the gram matrix? I'm not sure that makes sense... Without the scaler you could just do print(cross_val_sc

[Scikit-learn-general] Cross validation with a pre-computed kernel

2015-01-06 Thread Morgan Hoffman
Hi, I am trying to do a k-fold cross validation with a precomputed kernel. However, I end up with an error message that looks like this: Traceback (most recent call last): File "kfold_simple_data.py", line 64, in score = clf.score(test_gram_matrix, test_labels) File "/usr/local/lib/python2

Re: [Scikit-learn-general] cross validation with random forests

2014-09-29 Thread Andy
Maybe some of the tree huggers can say something about that ;) Below are my best guess. I am surprised to see that the docs say no regularization is usually best. I would not use such large upper bounds as you did, and I would never search the full range, but rather steps to get only a few cand

Re: [Scikit-learn-general] cross validation with random forests

2014-09-27 Thread Romaniuk, Michal
Hi Satra, In my experience, adjusting max_features can make some difference (I work with image data). Cheers, Michal -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Statu

Re: [Scikit-learn-general] cross validation with random forests

2014-09-27 Thread Satrajit Ghosh
thanks andy. are there any general heuristics for these parameters - given that their ranges are over the samples? max_depth = range(1, nsamples) or min_samples_leaves = range(1, nsamples) also related question: given that nsamples would actually depend on the cv method of the GridSearchCV, is t

Re: [Scikit-learn-general] cross validation with random forests

2014-09-26 Thread Andy
Hi Satra. You should set "n_estimators" as high as you can afford time and memory wise, and then cross-validate over (at least) one of the regularization parameters, for example over max_depth or min_samples_leaves. You can also search over max_features. Cheers, Andy On 09/26/2014 10:24 PM,

Re: [Scikit-learn-general] cross validation with random forests

2014-09-26 Thread Satrajit Ghosh
hi folks, what are some useful ranges of parameters to throw into a grid search? and are there specific difference between randomforests and extra trees? i understand one could try different impurity measures for classification, but any suggestions on sensitivity of other parameters would be nice.

Re: [Scikit-learn-general] cross validation with random forests

2014-09-25 Thread Andy
On 09/23/2014 11:50 PM, Pagliari, Roberto wrote: I’m a bit confused as to why gridsearchCV is not needed with random forests. I understand that with RF, each tree will only get to see a partial representation of the data. Why do you say GridSearchCV is not needed? I think it should always b

Re: [Scikit-learn-general] cross validation with random forests

2014-09-23 Thread Joel Nothman
You can indeed tune parameters of the RF with grid search, and the score method will be used although you could specify a different task metric to GridSearchCV's scoring parameter. On 24 September 2014 07:50, Pagliari, Roberto wrote: > I’m a bit confused as to why gridsearchCV is not needed with

[Scikit-learn-general] cross validation with random forests

2014-09-23 Thread Pagliari, Roberto
I'm a bit confused as to why gridsearchCV is not needed with random forests. I understand that with RF, each tree will only get to see a partial representation of the data. However, if I wanted to tune some parameters of the RF, wouldn't I still need to do gridsearch? If that is the case, does

Re: [Scikit-learn-general] cross-validation

2013-12-18 Thread José Ricardo
Hi Jian, 1. your pipeline probably has other sources of non-determinism. SVC also has a random_state parameter, for example. You should define all random_state parameters in your pipeline. 2. Yes and yes. Your best shot is to split them randomly, AFAIK. Best regards, José Ricardo On Thu, Dec

[Scikit-learn-general] cross-validation

2013-12-12 Thread Su, Jian, Ph.D.
Hello, I am using pipeline and grid to find the best hyperparameters, as the code in the end of the post. Here are two questions: 1. Even I set random_state=0, the results are not the same every time. How can I find the "truth"? 0.867933723197 {'clf__bootstrap': False, 'clf__max_depth': 10, '

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-31 Thread Jaques Grobler
Makes sense to me to deprecate here +1 2013/7/31 Olivier Grisel > +1 for deprecating boolean mask for CV as well. > > > -- > Get your SQL database under version control now! > Version control is standard for application

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-31 Thread Olivier Grisel
+1 for deprecating boolean mask for CV as well. -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-31 Thread Gael Varoquaux
On Wed, Jul 31, 2013 at 09:14:15AM +1000, Joel Nothman wrote: > What is the intention behind indices=False; Old design oversight (aka historical reasons). > why not deprecate it and simplify the API and code? (And speed up > indexing by using np.take.) +1! Making things simpler is always better.

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-31 Thread Alexandre Gramfort
hi, indeed we could stick to indices and use np.take whenever possible. In [33]: A = np.random.randn(500, 500) In [34]: idx = np.unique(np.random.randint(0, 499, 400)) In [35]: mask = np.zeros(500, dtype=np.bool) In [36]: mask[idx] = True In [37]: %timeit A[idx] 1000 loops, best of 3: 1.79 ms per

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-30 Thread Joel Nothman
On Wed, Jul 31, 2013 at 4:08 PM, Lars Buitinck wrote: > 2013/7/31 Joel Nothman : > > I am wondering why there is a need to support the indices=False case in > > cross_validation. Indices are superior in that they can be used with > np.take > > and with sparse matrices. And most of the standard cv

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-30 Thread Lars Buitinck
2013/7/31 Joel Nothman : > I am wondering why there is a need to support the indices=False case in > cross_validation. Indices are superior in that they can be used with np.take > and with sparse matrices. And most of the standard cv implementations output > indices that are converted into boolean

[Scikit-learn-general] cross-validation and indices=False

2013-07-30 Thread Joel Nothman
Hi, I'm sure you're all burnt out from what looks like a great sprint; thanks for all that work and congratulations on the RC! So I apologise for the bad timing. I am wondering why there is a need to support the indices=False case in cross_validation. Indices are superior in that they can be used

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-14 Thread Robert Layton
On 14 January 2013 19:43, Andreas Mueller wrote: > Hi Robert. > Not sure if you saw my mail: > In current master, this is fixed! > See > https://github.com/scikit-learn/scikit-learn/issues/1137 > and > https://github.com/scikit-learn/scikit-learn/pull/1443 > > Best, > Andy > > > -

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-14 Thread Andreas Mueller
Hi Robert. Not sure if you saw my mail: In current master, this is fixed! See https://github.com/scikit-learn/scikit-learn/issues/1137 and https://github.com/scikit-learn/scikit-learn/pull/1443 Best, Andy -- Master Visual

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-14 Thread Robert Layton
On 14 January 2013 17:42, Gael Varoquaux wrote: > > I've been having a lot of trouble loading as a numpy array. I know > > generally how to do it, but I must be doing it wrong since the numpy > > array can't fit in memory, whle the "list of strings" representation > > does > > I believe that i

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-13 Thread amueller
there is a fix for that in current master. check arrays now has 'allow lists'. andy Robert Layton schrieb: >When using cross_validation.X, all arrays are checked in the normal way >-- >using check_arrays. >I am developing code that uses string documents as input, so I have a >list >of strings

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-13 Thread Gael Varoquaux
> I've been having a lot of trouble loading as a numpy array. I know > generally how to do it, but I must be doing it wrong since the numpy > array can't fit in memory, whle the "list of strings" representation > does I believe that it's because the string are store in a 'string representation

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-13 Thread Robert Layton
On 14 January 2013 16:10, Kenneth C. Arnold wrote: > Why not use numpy arrays of strings all along? Their importance here is > fancy indexing... Or use X=np.arange(N) and do the fancy indexing yourself > on demand? > > -Ken > On Jan 13, 2013 11:04 PM, "Robert Layton" wrote: > >> When using cross_

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-13 Thread Kenneth C. Arnold
Why not use numpy arrays of strings all along? Their importance here is fancy indexing... Or use X=np.arange(N) and do the fancy indexing yourself on demand? -Ken On Jan 13, 2013 11:04 PM, "Robert Layton" wrote: > When using cross_validation.X, all arrays are checked in the normal way -- > using

[Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-13 Thread Robert Layton
When using cross_validation.X, all arrays are checked in the normal way -- using check_arrays. I am developing code that uses string documents as input, so I have a list of strings as the "data" and a numpy array as classes as normal. (In case anyone doesn't know, my research area is authorship ana

Re: [Scikit-learn-general] Cross validation iterator - leave one out per class

2012-11-29 Thread Gael Varoquaux
On Thu, Nov 29, 2012 at 03:53:03PM +0100, Philipp Singer wrote: > Does this even make sense? ;) Yes, I think so. > If so, is there some easy way of doing so in scikit learn? Faced with a similar problem, I would write my own cross-validation class. G ---

Re: [Scikit-learn-general] Cross validation iterator - leave one out per class

2012-11-29 Thread Andreas Mueller
Am 29.11.2012 15:53, schrieb Philipp Singer: > Hey! > > I have the following scenario: > > I have e.g., three different classes. For class 0 I may have 6 samples, > for class 1 ten and for class 2 four. > > I now want to do cross validation ten times, but in my case I want to > train on all samples

[Scikit-learn-general] Cross validation iterator - leave one out per class

2012-11-29 Thread Philipp Singer
Hey! I have the following scenario: I have e.g., three different classes. For class 0 I may have 6 samples, for class 1 ten and for class 2 four. I now want to do cross validation ten times, but in my case I want to train on all samples for a class except one which I want to use as test data.

Re: [Scikit-learn-general] cross validation cv parameter

2012-08-13 Thread Zach Bastick
Hi Andy, Yes, it is regression, so that explains it. Here is the script and data that produced the output: https://dl.dropbox.com/u/74279156/accuracy.zip Thanks, Zach On 13 August 2012 16:21, Andreas Mueller wrote: > Hi Zach. > If this is related to your previous problems, let me just > answe

Re: [Scikit-learn-general] cross validation cv parameter

2012-08-13 Thread Andreas Mueller
Hi Zach. If this is related to your previous problems, let me just answer 1: the values depend on what error score is used. If your problem is a regression problem, the standard score is r2, which can become negative. That the CV values vary so much is really a bit odd. Could you post a gist with

[Scikit-learn-general] cross validation cv parameter

2012-08-13 Thread Zach Bastick
Changing the cv parameter (number of iterations) in cross_val_score() really changes the returned scores. Increasing the CV doesn't neccessarily mean that the returned scores stabalise. Instead, they get worse, and only get better later. I have included the output of increasing the CV below. M

Re: [Scikit-learn-general] cross-validation example digits

2012-06-08 Thread Alexandre Gramfort
it indeed seems like a C scaling problem. C range is too high to see something. commit pushed Alex On Fri, Jun 8, 2012 at 10:02 PM, Satrajit Ghosh wrote: > is this example meant to look like this or is this related to the scale C > discussion. > > http://scikit-learn.org/stable/auto_examples/ex

[Scikit-learn-general] cross-validation example digits

2012-06-08 Thread Satrajit Ghosh
is this example meant to look like this or is this related to the scale C discussion. http://scikit-learn.org/stable/auto_examples/exercises/plot_cv_digits.html cheers, satra -- Live Security Virtual Conference Exclusive