Combining the helpful suggestions of Andy & Joel I'm tyring the following:
# Make a scoring function for the pipeline v_measure_scorer = make_scorer(v_measure_score,labels_true=labels[:,0],labels_pred=kmeans.predict) # Parameters of pipelines are set using ‘__’ separated parameter names: estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), scoring=v_measure_scorer) estimator.fit(D_scaled) Was this what you were referring to Andy? Thanks, Lee. On Wed, May 14, 2014 at 1:27 AM, Andreas Mueller <t3k...@gmail.com> wrote: > I think you should use the make_scorer function. Using labels_ will not > work, as it will only have labels for the training split, while the > performance is measured on the test split. > > On May 14, 2014 2:28 AM, "Joel Nothman" <joel.noth...@gmail.com> wrote: >> >> Hi Lee, >> >> The scoring parameter, if not an existing scoring name, needs to be a >> function with the signature: >> >> fn(estimator, X, y_true) -> score which increases with goodness >> >> So I think you want to define: >> >> def score_clusters(estimator, X, y): >> return v_measure_score(y[:,0], kmeans.labels_)) >> >> Then construct the GridSearchCV as: >> >> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), >> scoring=score_clusters) >> >> It seems like there should be more predefined scorers available for >> clustering... >> >> Cheers, >> >> - Joel >> >> >> On 14 May 2014 09:10, Lee Zamparo <zamp...@gmail.com> wrote: >>> >>> Hi, >>> >>> I'm trying to use GridSearchCV and Pipeline to tune the gamma >>> parameter of kernel PCA. I'd like to use kernel PCA to transform the >>> data, followed by kmeans to cluster the data, followed by v-measure to >>> measure the goodness of fit of the clustering. >>> >>> Here's the relevant snippet of my script >>> ----- >>> # Set up the kPCA -> kmeans -> v-measure pipeline >>> kpca = KernelPCA(kernel="rbf") >>> kmeans = KMeans(n_clusters=3) >>> pipe = Pipeline(steps=[('kpca', kpca), ('kmeans', kmeans)]) >>> >>> # Range of parameters to consider for gamma in the RBF kernel for kPCA >>> gammas = np.logspace(-10,2,num=100) >>> >>> # Parameters of pipelines are set using ‘__’ separated parameter names: >>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), >>> scoring=v_measure_score(labels[:,0],kmeans.labels_)) >>> estimator.fit(D_scaled) >>> >>> ----- >>> >>> Yet I get an AttributeError claiming that the kmeans object has no >>> labels_ attribute. >>> >>> File "/home/lee/projects/SdA_reduce/utils/kernel_pca_pipeline.py", >>> line 86, in <module> >>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), >>> scoring=v_measure_score(labels[:,0],kmeans.labels_)) >>> >>> AttributeError: 'KMeans' object has no attribute 'labels_' >>> >>> Does anyone have any tips on how I should restructure my snippet to >>> get my desired outcome? >>> >>> Thanks, >>> >>> Lee. >>> >>> >>> ------------------------------------------------------------------------------ >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. >>> Get unparalleled scalability from the best Selenium testing platform >>> available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general