Re: [Scikit-learn-general] How to present parameter search results

Joel Nothman Sun, 09 Jun 2013 02:42:24 -0700

Again, it's probably over the top, but I think it's a useful interface
(prototyped at https://github.com/jnothman/scikit-learn/tree/search_results
):


>>> from __future__ import print_function
>>> from sklearn.grid_search import GridSearchCV
>>> from sklearn.datasets import load_iris
>>> from sklearn.svm import SVC
>>> iris = load_iris()
>>> grid = {'C': [0.01, 0.1, 1], 'degree': [1, 2, 3]}
>>> search = GridSearchCV(SVC(kernel='poly'),
param_grid=grid).fit(iris.data, iris.target)
>>> res = search.results_
>>> res.best().mean_test_score
0.97333333333333338
>>> res
<9 candidates. Best results:
  <0.973 for {'C': 0.10000000000000001, 'degree': 3}>,
  <0.967 for {'C': 1.0, 'degree': 3}>,
  <0.967 for {'C': 1.0, 'degree': 2}>, ...>
>>> for tup in res.zipped('parameters', 'mean_test_score',
'std_test_score'):
...     print(*tup)
...
{'C': 0.01, 'degree': 1} 0.673333333333 0.033993463424
{'C': 0.01, 'degree': 2} 0.926666666667 0.00942809041582
{'C': 0.01, 'degree': 3} 0.966666666667 0.0188561808316
{'C': 0.10000000000000001, 'degree': 1} 0.94 0.0163299316186
{'C': 0.10000000000000001, 'degree': 2} 0.966666666667 0.0188561808316
{'C': 0.10000000000000001, 'degree': 3} 0.973333333333 0.00942809041582
{'C': 1.0, 'degree': 1} 0.966666666667 0.0249443825785
{'C': 1.0, 'degree': 2} 0.966666666667 0.00942809041582
{'C': 1.0, 'degree': 3} 0.966666666667 0.0188561808316
>>>

On Sun, Jun 9, 2013 at 12:46 PM, Joel Nothman
<jnoth...@student.usyd.edu.au>wrote:

> On Sun, Jun 9, 2013 at 12:38 PM, Joel Nothman <
> jnoth...@student.usyd.edu.au> wrote:
>
>>
>> This may be getting into crazy land, and certainly close to
>> reimplementing Pandas for the 2d case, or recarrays with benefits, but:
>> imagine we had a SearchResult object with:
>> * attributes like fold_test_score, fold_train_score, fold_train_time,
>> each a 2d array.
>> * __getattr__ magic that produced mean_test_score, mean_train_time, etc.
>> and std_test_score, std_train_time on demand (weighted by some
>> samples_per_fold attr if iid=True).
>> * attributes like param_C that would enable selecting certain candidates
>> by their parameter settings (through numpy-style boolean queries).
>> * __getitem__ that can pull out one or more candidates by index (and
>> returns a SearchResult).
>> * a method that return a dict of selected 1d array attributes for
>> Pandas-style (or spreadsheet? in that case a list of dicts) integration
>> * a method that zips over selected attributes for simple iteration.
>>
>
> And sure, a method that performs
> self[np.argsort(self.mean_test_score)[-k:]] to get the k best results...
>
>
>> - Joel
>>
>> On Fri, Jun 7, 2013 at 8:02 PM, Olivier Grisel 
>> <olivier.gri...@ensta.org>wrote:
>>
>>> TL;DNR: parameter search results datastructure choice should
>>> anticipate new use-cases
>>>
>>> Thanks Joel for the detailed analysis.
>>>
>>> I the current situation I think I my-self I like:
>>>
>>> 5. many attributes, each an array, on a custom results object
>>>
>>> This makes it possible to write a `__repr__` method on that object
>>> that could write a statistical summary of the top 10 or so candidate
>>> parameterizations.
>>>
>>> I thinks we should keep `best_param_`, `best_estimator_` and
>>> `best_score_` as quick access convenience accessors even if they are
>>> redundant with the detailed content of the search results.
>>>
>>> However to move the discussion forward on the model evaluation results
>>> there are three additional use-cases currently not addressed by the
>>> current design but that I would like to be have addressed somehow at
>>> some point in the future:
>>>
>>> A- Fault tolerance and handling missing results caused by evaluation
>>> errors
>>>
>>> How to handle partial results? Sometimes some combinations of the
>>> parameters will trigger runtime errors, for instance if the evaluation
>>> raises an exception if the estimator fails to convergence
>>> (ill-conditioning) or numeric overflow / underflow (apparently this
>>> can happen in our SGD cython code and raises a ValueError,
>>> to be debugged) or memory error...
>>>
>>> I think the whole search should not crash if one evaluation fails
>>> after 3 hours of computation and many successful evaluations. The
>>> error should be collected and the evaluation iteration should be
>>> excluded from the final results statistics.
>>>
>>> B- Being able to monitor partial results and interrupt search before
>>> waiting for the end (e.g. by handling KeyBoardInterrupt using an async
>>> job scheduling API)
>>>
>>> Also, even if the current joblib API does not allow for that, I think
>>> it would be very useful to make it possible at some point to allow the
>>> user to monitor the current progress in the search and allow him to
>>> interrupt it without loosing access to the evaluation results
>>> collected up to that point.
>>>
>>> C- Being able to warm-start a search with previously collected results
>>>
>>> C1: Refining the search space: Submit a new grid or parameter sampler
>>> that focus the search at a finer scale around an interesting area in
>>> existing dimensions and optionally trim dimensions that are deemed
>>> useless by the user according to the past results.
>>>
>>> C2: Refining the cross-validation: the user might want to perform a
>>> first search with very low number of CV (e.g. 1 or 2 iterations of
>>> shuffle split) to have a coarse overview of the interesting part of
>>> the search space, then trim the parameter grid to a smaller yet
>>> promising grid and then add more CV iterations only for those
>>> parameters so as to be able get finer estimates of the mean validation
>>> scores by reducing the standard error of the mean across random CV
>>> folds.
>>>
>>> Note: C2 is only useful for the (Stratified)ShuffleSplit cross
>>> validation where you can grow n_iter or change random_state to get as
>>> many CV split as you want provided the dataset is large enough.
>>>
>>> In order to be able to address A, B and C in the future, I think the
>>> estimator object should adopt a simple primary datastructure that is a
>>> growable list of individual  (parameter, CV-fold)-scoped evaluations
>>> and then provide the user with methods to simply introspect the, such
>>> as: find the top 10 parameters by average validation scores across
>>> currently available CV fold (some CV fold could be missing due to some
>>> partial evaluation caused by A (failures) or B (interrupted
>>> computation)).
>>>
>>> Each item in this list could have:
>>>
>>> - parameters_id: unique parameter set integer identifier (e.g. a deep
>>> hash or random index)
>>> - parameters: the parameter settings dict
>>> - cv_id: unique CV object integer identifier (hash of the of the CV
>>> object or random index)
>>> - cv_iter_index: the CV fold iteration integer index
>>> - validation_score_name: the primary validation score (to be used for
>>> ranking models)
>>>
>>> Optional attributes we could add in the future:
>>>
>>> - training score to be able to estimate under-fitting (if non-zero)
>>> and over-fitting by diffing with the validation score
>>> - more training an validation scores (e.g. precision, recall, AUC...)
>>> - more evaluation metrics that are not scores by useful for model
>>> analysis (e.g. a confusion matrix for classifiaction)
>>> - fitting time
>>> - prediction time (could be complicate to separate out of the complete
>>> scoring time due to our Scorer API that currently hides it).
>>>
>>> Then to compute the mean score for a given parameter sets one could
>>> group-by parameters_id (e.g. using a python `defaultdict(list)` with
>>> parameter_id as key).
>>> Advanced users could also convert this log of evaluation as a pandas
>>> dataframe and then do joins / group-by themselves to compute various
>>> aggregate statistics across the dimensions of there choice.
>>>
>>> Finally there is an additional use case that I have in mind even if
>>> possibly less a priority than the other:
>>>
>>> D: warm starting with larger subsamples of the dataset
>>>
>>> Make it possible to start the search on a small sub sample of the
>>> datasets (e.g. 10% of the complete dataset) , then with a larger
>>> subset (e.g. with 20% of the dataset) to be able to identify the most
>>> promising parameterization quickly and evaluate how sensitive they are
>>> sensitive to a doubling of the dataset size. That would make it
>>> possible to select a smaller grid for a parameter search on the full
>>> dataset and also being able to compute learning curves for
>>> bias-variance analysis of the individual parameters.
>>>
>>> --
>>> Olivier
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> How ServiceNow helps IT people transform IT departments:
>>> 1. A cloud service to automate IT design, transition and operations
>>> 2. Dashboards that offer high-level views of enterprise services
>>> 3. A single system of record for all IT processes
>>> http://p.sf.net/sfu/servicenow-d2d-j
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] How to present parameter search results

Reply via email to