Re: [Scikit-learn-general] GridSearchCV + Pipeline with v_measure

Lee Zamparo Wed, 14 May 2014 08:12:43 -0700

Combining the helpful suggestions of Andy & Joel I'm tyring the following:


# Make a scoring function for the pipeline
v_measure_scorer =
make_scorer(v_measure_score,labels_true=labels[:,0],labels_pred=kmeans.predict)

# Parameters of pipelines are set using ‘__’ separated parameter names:
estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
scoring=v_measure_scorer)
estimator.fit(D_scaled)

Was this what you were referring to Andy?

Thanks,

Lee.

On Wed, May 14, 2014 at 1:27 AM, Andreas Mueller <t3k...@gmail.com> wrote:
> I think you should use the make_scorer function. Using labels_ will not
> work, as it will only have labels for the training split, while the
> performance is measured on the test split.
>
> On May 14, 2014 2:28 AM, "Joel Nothman" <joel.noth...@gmail.com> wrote:
>>
>> Hi Lee,
>>
>> The scoring parameter, if not an existing scoring name, needs to be a
>> function with the signature:
>>
>> fn(estimator, X, y_true) -> score which increases with goodness
>>
>> So I think you want to define:
>>
>> def score_clusters(estimator, X, y):
>>     return v_measure_score(y[:,0], kmeans.labels_))
>>
>> Then construct the GridSearchCV as:
>>
>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
>> scoring=score_clusters)
>>
>> It seems like there should be more predefined scorers available for
>> clustering...
>>
>> Cheers,
>>
>> - Joel
>>
>>
>> On 14 May 2014 09:10, Lee Zamparo <zamp...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to use GridSearchCV and Pipeline to tune the gamma
>>> parameter of kernel PCA.  I'd like to use kernel PCA to transform the
>>> data, followed by kmeans to cluster the data, followed by v-measure to
>>> measure the goodness of fit of the clustering.
>>>
>>> Here's the relevant snippet of my script
>>> -----
>>> # Set up the kPCA -> kmeans -> v-measure pipeline
>>> kpca = KernelPCA(kernel="rbf")
>>> kmeans = KMeans(n_clusters=3)
>>> pipe = Pipeline(steps=[('kpca', kpca), ('kmeans', kmeans)])
>>>
>>> # Range of parameters to consider for gamma in the RBF kernel for kPCA
>>> gammas = np.logspace(-10,2,num=100)
>>>
>>> # Parameters of pipelines are set using ‘__’ separated parameter names:
>>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
>>> scoring=v_measure_score(labels[:,0],kmeans.labels_))
>>> estimator.fit(D_scaled)
>>>
>>> -----
>>>
>>> Yet I get an AttributeError claiming that the kmeans object has no
>>> labels_ attribute.
>>>
>>> File "/home/lee/projects/SdA_reduce/utils/kernel_pca_pipeline.py",
>>> line 86, in <module>
>>>   estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
>>> scoring=v_measure_score(labels[:,0],kmeans.labels_))
>>>
>>> AttributeError: 'KMeans' object has no attribute 'labels_'
>>>
>>> Does anyone have any tips on how I should restructure my snippet to
>>> get my desired outcome?
>>>
>>> Thanks,
>>>
>>> Lee.
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>>> Get unparalleled scalability from the best Selenium testing platform
>>> available
>>> Simple to use. Nothing to install. Get started now for free."
>>> http://p.sf.net/sfu/SauceLabs
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>> Instantly run your Selenium tests across 300+ browser/OS combos.
>> Get unparalleled scalability from the best Selenium testing platform
>> available
>> Simple to use. Nothing to install. Get started now for free."
>> http://p.sf.net/sfu/SauceLabs
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GridSearchCV + Pipeline with v_measure

Reply via email to