On 01/06/2015 01:21 PM, Morgan Hoffman wrote:
Hi Andy,
Thanks for your help. Is there something in the scikit-learn
documentation (or any other resource) that explains why the kernel
matrix at test time needs to be the kernel between the test data and
the training data? I am quite new to mach
0.7 is really a 0.
Thanks!
Date: Tue, 6 Jan 2015 12:45:06 -0500
From: [email protected]
To: [email protected]
Subject: Re: [Scikit-learn-general] Cross validation with a pre-computed
kernel
The kernel matrix at test time needs to be the kernel
I am a bit confused as to why you code doesn't crash on the call to the
scaler.
What is the shape of train_gram_matrix and test_gram_matrix?
On 01/06/2015 12:27 PM, Morgan Hoffman wrote:
Hi,
I am trying to do a k-fold cross validation with a precomputed kernel.
However, I end up with an erro
The kernel matrix at test time needs to be the kernel between the test
data and the training data.
Which I guess is not what get_gram_matrix does.
Why are you applying the MinMaxScaler to the gram matrix? I'm not sure
that makes sense...
Without the scaler you could just do
print(cross_val_sc
Maybe some of the tree huggers can say something about that ;) Below are
my best guess.
I am surprised to see that the docs say no regularization is usually best.
I would not use such large upper bounds as you did, and I would never
search the full range, but rather steps to get only a few cand
Hi Satra,
In my experience, adjusting max_features can make some difference (I work with
image data).
Cheers,
Michal
--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Statu
thanks andy.
are there any general heuristics for these parameters - given that their
ranges are over the samples?
max_depth = range(1, nsamples)
or
min_samples_leaves = range(1, nsamples)
also related question: given that nsamples would actually depend on the cv
method of the GridSearchCV, is t
Hi Satra.
You should set "n_estimators" as high as you can afford time and memory
wise, and then cross-validate over (at least) one of the regularization
parameters,
for example over max_depth or min_samples_leaves. You can also search
over max_features.
Cheers,
Andy
On 09/26/2014 10:24 PM,
hi folks,
what are some useful ranges of parameters to throw into a grid search? and
are there specific difference between randomforests and extra trees? i
understand one could try different impurity measures for classification,
but any suggestions on sensitivity of other parameters would be nice.
On 09/23/2014 11:50 PM, Pagliari, Roberto wrote:
I’m a bit confused as to why gridsearchCV is not needed with random
forests. I understand that with RF, each tree will only get to see a
partial representation of the data.
Why do you say GridSearchCV is not needed?
I think it should always b
You can indeed tune parameters of the RF with grid search, and the score
method will be used although you could specify a different task metric to
GridSearchCV's scoring parameter.
On 24 September 2014 07:50, Pagliari, Roberto
wrote:
> I’m a bit confused as to why gridsearchCV is not needed with
Hi Jian,
1. your pipeline probably has other sources of non-determinism. SVC also
has a random_state parameter, for example. You should define all
random_state parameters in your pipeline.
2. Yes and yes. Your best shot is to split them randomly, AFAIK.
Best regards,
José Ricardo
On Thu, Dec
Makes sense to me to deprecate here +1
2013/7/31 Olivier Grisel
> +1 for deprecating boolean mask for CV as well.
>
>
> --
> Get your SQL database under version control now!
> Version control is standard for application
+1 for deprecating boolean mask for CV as well.
--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your
On Wed, Jul 31, 2013 at 09:14:15AM +1000, Joel Nothman wrote:
> What is the intention behind indices=False;
Old design oversight (aka historical reasons).
> why not deprecate it and simplify the API and code? (And speed up
> indexing by using np.take.)
+1! Making things simpler is always better.
hi,
indeed we could stick to indices and use np.take whenever possible.
In [33]: A = np.random.randn(500, 500)
In [34]: idx = np.unique(np.random.randint(0, 499, 400))
In [35]: mask = np.zeros(500, dtype=np.bool)
In [36]: mask[idx] = True
In [37]: %timeit A[idx]
1000 loops, best of 3: 1.79 ms per
On Wed, Jul 31, 2013 at 4:08 PM, Lars Buitinck wrote:
> 2013/7/31 Joel Nothman :
> > I am wondering why there is a need to support the indices=False case in
> > cross_validation. Indices are superior in that they can be used with
> np.take
> > and with sparse matrices. And most of the standard cv
2013/7/31 Joel Nothman :
> I am wondering why there is a need to support the indices=False case in
> cross_validation. Indices are superior in that they can be used with np.take
> and with sparse matrices. And most of the standard cv implementations output
> indices that are converted into boolean
On 14 January 2013 19:43, Andreas Mueller wrote:
> Hi Robert.
> Not sure if you saw my mail:
> In current master, this is fixed!
> See
> https://github.com/scikit-learn/scikit-learn/issues/1137
> and
> https://github.com/scikit-learn/scikit-learn/pull/1443
>
> Best,
> Andy
>
>
> -
Hi Robert.
Not sure if you saw my mail:
In current master, this is fixed!
See
https://github.com/scikit-learn/scikit-learn/issues/1137
and
https://github.com/scikit-learn/scikit-learn/pull/1443
Best,
Andy
--
Master Visual
On 14 January 2013 17:42, Gael Varoquaux wrote:
> > I've been having a lot of trouble loading as a numpy array. I know
> > generally how to do it, but I must be doing it wrong since the numpy
> > array can't fit in memory, whle the "list of strings" representation
> > does
>
> I believe that i
there is a fix for that in current master. check arrays now has 'allow lists'.
andy
Robert Layton schrieb:
>When using cross_validation.X, all arrays are checked in the normal way
>--
>using check_arrays.
>I am developing code that uses string documents as input, so I have a
>list
>of strings
> I've been having a lot of trouble loading as a numpy array. I know
> generally how to do it, but I must be doing it wrong since the numpy
> array can't fit in memory, whle the "list of strings" representation
> does
I believe that it's because the string are store in a 'string
representation
On 14 January 2013 16:10, Kenneth C. Arnold wrote:
> Why not use numpy arrays of strings all along? Their importance here is
> fancy indexing... Or use X=np.arange(N) and do the fancy indexing yourself
> on demand?
>
> -Ken
> On Jan 13, 2013 11:04 PM, "Robert Layton" wrote:
>
>> When using cross_
Why not use numpy arrays of strings all along? Their importance here is
fancy indexing... Or use X=np.arange(N) and do the fancy indexing yourself
on demand?
-Ken
On Jan 13, 2013 11:04 PM, "Robert Layton" wrote:
> When using cross_validation.X, all arrays are checked in the normal way --
> using
On Thu, Nov 29, 2012 at 03:53:03PM +0100, Philipp Singer wrote:
> Does this even make sense? ;)
Yes, I think so.
> If so, is there some easy way of doing so in scikit learn?
Faced with a similar problem, I would write my own cross-validation
class.
G
---
Am 29.11.2012 15:53, schrieb Philipp Singer:
> Hey!
>
> I have the following scenario:
>
> I have e.g., three different classes. For class 0 I may have 6 samples,
> for class 1 ten and for class 2 four.
>
> I now want to do cross validation ten times, but in my case I want to
> train on all samples
Hi Andy,
Yes, it is regression, so that explains it.
Here is the script and data that produced the output:
https://dl.dropbox.com/u/74279156/accuracy.zip
Thanks,
Zach
On 13 August 2012 16:21, Andreas Mueller wrote:
> Hi Zach.
> If this is related to your previous problems, let me just
> answe
Hi Zach.
If this is related to your previous problems, let me just
answer 1: the values depend on what error score is used.
If your problem is a regression problem, the standard score is r2,
which can become negative.
That the CV values vary so much is really a bit odd.
Could you post a gist with
it indeed seems like a C scaling problem. C range is too high to see something.
commit pushed
Alex
On Fri, Jun 8, 2012 at 10:02 PM, Satrajit Ghosh wrote:
> is this example meant to look like this or is this related to the scale C
> discussion.
>
> http://scikit-learn.org/stable/auto_examples/ex
30 matches
Mail list logo