Oh yes, I get that now. All this while I was thinking there was an issue
with the mac due to a similar issue discussed here
https://github.com/scikit-learn/scikit-learn/issues/5115.
Thanks a lot for clearing this up. I am going to change the loop and see
if I can run the parallel implementation o
I am not that much into the multi-processing implementation in scikit-learn /
joblib, but I think this could be one issue why your mac hangs… I’d say that
it’s probably the safest approach to only set the n_jobs parameter for the
innermost object.
E.g., if you 4 processors, you said the GridSea
I had not thought about the n_jobs parameter, mainly because it does not
run on my mac and the system just hangs if i use it.
The same code runs on linux server though.
I have one more clarification to seek.
I was running it on server with this code. Would this be fine or may I move
the n_jobs=3 t
You are welcome, and I am glad to hear that it works :). And “your" approach is
definitely the cleaner way to do it … I think you just need to be a bit careful
about the n_jobs parameter in practice, I would only set it to n_jobs=-1 in the
inner loop.
Best,
Sebastian
> On May 12, 2016, at 7:1
Thanks.
Actually there were 2 people running the same experiments and the other
person was doing as you have shown above.
We were getting the same results but since methods were different I wanted
to ensure that I am doing it the right way.
Thanks,
Amita
On Thu, May 12, 2016 at 2:43 PM, Sebastia
I see; that’s what I thought. At first glance, the approach (code) looks
correct to me but I haven’ t done it this way, yet. Typically, I use a more
“manual” approach iterating over the outer folds manually (since I typically
use nested CV for algo selection):
gs_est = … your gridsearch, pipel
Actually I do not have an independent test set and hence I want to use it
as an estimate for generalization performance. Hence my classifier is fixed
SVM and I want to learn the parameters and also estimate an unbiased
performance using only one set of data.
I wanted to ensure that my code correct
I would say there are 2 different applications of nested CV. You could use it
for algorithm selection (with hyperparam tuning in the inner loop). Or, you
could use it as an estimate of the generalization performance (only hyperparam
tuning), which has been reported to be less biased than the a k
Hi Amita,
As far as I understand your question, you only need one CV loop to optimize
your objective with scoring function provided:
===
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=n
hi sebastian,
sorry, maybe I was a little bit unclear, what I meant was the scenario 2)
> in contrast to 1) below:
>
> 1) perform k-fold cross-validation on the complete dataset for model
> selection and then report the score as estimate of the model's performance
> (not a good idea!)
>
if you me
>
> On the other hand, if they change, one cannot really calculate the average
> performance from the outer KFold scores.
>
>
Why not? If one sees the GridSearchCV(simple_estimator) as "the best that
simple_estimator can do if we let it try several parameters", then
everything becomes consistent. Y
Hi, Satrajit,
> In general, what would speak against an approach to just split the initial
> dataset into train/test (70/30), perform grid search (via k-fold CV) on the
> training set, and evaluate the model performance on the test dataset?
>
> isn't this what the cross-val score really does?
Thanks. However, the GridSearch may be very expensive considering the
parameters may not change for the different folds in the nested approach. On
the other hand, if they change, one cannot really calculate the average
performance from the outer KFold scores.
> On May 11, 2015, at 9:41 AM, Mic
hi sebastian,
I am wondering how to "use" or "interpret" those scores. For example, if
> the gamma parameters are set differently in the inner loops, we accumulate
> test scores from the outer loops that would correspond to different models,
> and calculating the average performance from those sco
Sorry, I misread what you wrote. Your suggested approach is perfectly find
and corresponds exactly to what would happen if you did the mentioned
cross_val_score + GridSearchCV on a train-test split of one 70-30 fold.
Doing it several times using e.g. an outer KFold just gives you several
scores to
On Mon, May 11, 2015 at 3:30 PM, Sebastian Raschka
wrote:
> Hi,
> I stumbled upon the brief note about nested cross-validation in the online
> documentation at
> http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html#grid-search
> =
> Nested cross-v
16 matches
Mail list logo