On Sun, Jun 9, 2013 at 12:38 PM, Joel Nothman
<jnoth...@student.usyd.edu.au>wrote:
>
> This may be getting into crazy land, and certainly close to reimplementing
> Pandas for the 2d case, or recarrays with benefits, but: imagine we had a
> SearchResult object with:
> * attributes like fold_test_score, fold_train_score, fold_train_time, each
> a 2d array.
> * __getattr__ magic that produced mean_test_score, mean_train_time, etc.
> and std_test_score, std_train_time on demand (weighted by some
> samples_per_fold attr if iid=True).
> * attributes like param_C that would enable selecting certain candidates
> by their parameter settings (through numpy-style boolean queries).
> * __getitem__ that can pull out one or more candidates by index (and
> returns a SearchResult).
> * a method that return a dict of selected 1d array attributes for
> Pandas-style (or spreadsheet? in that case a list of dicts) integration
> * a method that zips over selected attributes for simple iteration.
>
And sure, a method that performs
self[np.argsort(self.mean_test_score)[-k:]] to get the k best results...
> - Joel
>
> On Fri, Jun 7, 2013 at 8:02 PM, Olivier Grisel
> <olivier.gri...@ensta.org>wrote:
>
>> TL;DNR: parameter search results datastructure choice should
>> anticipate new use-cases
>>
>> Thanks Joel for the detailed analysis.
>>
>> I the current situation I think I my-self I like:
>>
>> 5. many attributes, each an array, on a custom results object
>>
>> This makes it possible to write a `__repr__` method on that object
>> that could write a statistical summary of the top 10 or so candidate
>> parameterizations.
>>
>> I thinks we should keep `best_param_`, `best_estimator_` and
>> `best_score_` as quick access convenience accessors even if they are
>> redundant with the detailed content of the search results.
>>
>> However to move the discussion forward on the model evaluation results
>> there are three additional use-cases currently not addressed by the
>> current design but that I would like to be have addressed somehow at
>> some point in the future:
>>
>> A- Fault tolerance and handling missing results caused by evaluation
>> errors
>>
>> How to handle partial results? Sometimes some combinations of the
>> parameters will trigger runtime errors, for instance if the evaluation
>> raises an exception if the estimator fails to convergence
>> (ill-conditioning) or numeric overflow / underflow (apparently this
>> can happen in our SGD cython code and raises a ValueError,
>> to be debugged) or memory error...
>>
>> I think the whole search should not crash if one evaluation fails
>> after 3 hours of computation and many successful evaluations. The
>> error should be collected and the evaluation iteration should be
>> excluded from the final results statistics.
>>
>> B- Being able to monitor partial results and interrupt search before
>> waiting for the end (e.g. by handling KeyBoardInterrupt using an async
>> job scheduling API)
>>
>> Also, even if the current joblib API does not allow for that, I think
>> it would be very useful to make it possible at some point to allow the
>> user to monitor the current progress in the search and allow him to
>> interrupt it without loosing access to the evaluation results
>> collected up to that point.
>>
>> C- Being able to warm-start a search with previously collected results
>>
>> C1: Refining the search space: Submit a new grid or parameter sampler
>> that focus the search at a finer scale around an interesting area in
>> existing dimensions and optionally trim dimensions that are deemed
>> useless by the user according to the past results.
>>
>> C2: Refining the cross-validation: the user might want to perform a
>> first search with very low number of CV (e.g. 1 or 2 iterations of
>> shuffle split) to have a coarse overview of the interesting part of
>> the search space, then trim the parameter grid to a smaller yet
>> promising grid and then add more CV iterations only for those
>> parameters so as to be able get finer estimates of the mean validation
>> scores by reducing the standard error of the mean across random CV
>> folds.
>>
>> Note: C2 is only useful for the (Stratified)ShuffleSplit cross
>> validation where you can grow n_iter or change random_state to get as
>> many CV split as you want provided the dataset is large enough.
>>
>> In order to be able to address A, B and C in the future, I think the
>> estimator object should adopt a simple primary datastructure that is a
>> growable list of individual (parameter, CV-fold)-scoped evaluations
>> and then provide the user with methods to simply introspect the, such
>> as: find the top 10 parameters by average validation scores across
>> currently available CV fold (some CV fold could be missing due to some
>> partial evaluation caused by A (failures) or B (interrupted
>> computation)).
>>
>> Each item in this list could have:
>>
>> - parameters_id: unique parameter set integer identifier (e.g. a deep
>> hash or random index)
>> - parameters: the parameter settings dict
>> - cv_id: unique CV object integer identifier (hash of the of the CV
>> object or random index)
>> - cv_iter_index: the CV fold iteration integer index
>> - validation_score_name: the primary validation score (to be used for
>> ranking models)
>>
>> Optional attributes we could add in the future:
>>
>> - training score to be able to estimate under-fitting (if non-zero)
>> and over-fitting by diffing with the validation score
>> - more training an validation scores (e.g. precision, recall, AUC...)
>> - more evaluation metrics that are not scores by useful for model
>> analysis (e.g. a confusion matrix for classifiaction)
>> - fitting time
>> - prediction time (could be complicate to separate out of the complete
>> scoring time due to our Scorer API that currently hides it).
>>
>> Then to compute the mean score for a given parameter sets one could
>> group-by parameters_id (e.g. using a python `defaultdict(list)` with
>> parameter_id as key).
>> Advanced users could also convert this log of evaluation as a pandas
>> dataframe and then do joins / group-by themselves to compute various
>> aggregate statistics across the dimensions of there choice.
>>
>> Finally there is an additional use case that I have in mind even if
>> possibly less a priority than the other:
>>
>> D: warm starting with larger subsamples of the dataset
>>
>> Make it possible to start the search on a small sub sample of the
>> datasets (e.g. 10% of the complete dataset) , then with a larger
>> subset (e.g. with 20% of the dataset) to be able to identify the most
>> promising parameterization quickly and evaluate how sensitive they are
>> sensitive to a doubling of the dataset size. That would make it
>> possible to select a smaller grid for a parameter search on the full
>> dataset and also being able to compute learning curves for
>> bias-variance analysis of the individual parameters.
>>
>> --
>> Olivier
>>
>>
>> ------------------------------------------------------------------------------
>> How ServiceNow helps IT people transform IT departments:
>> 1. A cloud service to automate IT design, transition and operations
>> 2. Dashboards that offer high-level views of enterprise services
>> 3. A single system of record for all IT processes
>> http://p.sf.net/sfu/servicenow-d2d-j
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general