Thanks, Olivier. Those are some interesting use-cases:

> A- Fault tolerance and handling missing results caused by evaluation
errors

I don't think this affects the output format, except where we can actually
get partial results for a fold, or if we want to report successful folds
and ignore others for a single candidate parameter setting. But I wonder if
that just makes things much too complicated.

> B: Being able to monitor partial results and interrupt search
before waiting for the end (e.g. by handling KeyBoardInterrupt using an
async job scheduling API)

So the stop and resume case just means the results need to be appendable...?

In general, I don't think Parallel's returning a list is of great benefit
here. Working with an iterable would be more comfortable.

> C1: Refining the search space

Similarly, it should be possible to have fit append further results.

> C2: Refining the cross-validation
and
> D: warm starting with larger subsamples of the dataset

I would think in these cases it's better to create a new estimator and/or
keep results separate.

> Optional attributes we could add in the future:

Something you missed: the ability to get back diagnostics on the quality /
complexity of the model, e.g. coefficient sparsity.

These suggestions do make me consider storage in an external database (a
blob store, or an online spreadsheet) as hyperopt allows. I think "allows"
is important here: when you get to that scale of experimentation, you
probably don't want results logged only in memory. But we need a sensible
default for working with a few thousand candidates.

Except for purity of parallelism, I don't see why you would want do store
each fold result for a single candidate separately. I don't see the
use-case for providing them separately to the user (except where one fold
failed and another succeeded). As far as I'm concerned, the frontend should
hide that.

I do see that providing all fields together for a single candidate is the
most common use-case and argues against providing parallel arrays (but not
against a structured array / recarray).

Finally, the single most important thing I can see about making results
explorable is not providing candidate parameter settings only as dicts, but
splitting the dicts out so that you can query by the value of each
parameter, and group over others.

This may be getting into crazy land, and certainly close to reimplementing
Pandas for the 2d case, or recarrays with benefits, but: imagine we had a
SearchResult object with:
* attributes like fold_test_score, fold_train_score, fold_train_time, each
a 2d array.
* __getattr__ magic that produced mean_test_score, mean_train_time, etc.
and std_test_score, std_train_time on demand (weighted by some
samples_per_fold attr if iid=True).
* attributes like param_C that would enable selecting certain candidates by
their parameter settings (through numpy-style boolean queries).
* __getitem__ that can pull out one or more candidates by index (and
returns a SearchResult).
* a method that return a dict of selected 1d array attributes for
Pandas-style (or spreadsheet? in that case a list of dicts) integration
* a method that zips over selected attributes for simple iteration.

Is this crazy, or does it do exactly what we want? or both? And how does it
not meet the needs of your wishlist, Olivier (except where the number of
folds differ)?

- Joel

On Fri, Jun 7, 2013 at 8:02 PM, Olivier Grisel <olivier.gri...@ensta.org>wrote:

> TL;DNR: parameter search results datastructure choice should
> anticipate new use-cases
>
> Thanks Joel for the detailed analysis.
>
> I the current situation I think I my-self I like:
>
> 5. many attributes, each an array, on a custom results object
>
> This makes it possible to write a `__repr__` method on that object
> that could write a statistical summary of the top 10 or so candidate
> parameterizations.
>
> I thinks we should keep `best_param_`, `best_estimator_` and
> `best_score_` as quick access convenience accessors even if they are
> redundant with the detailed content of the search results.
>
> However to move the discussion forward on the model evaluation results
> there are three additional use-cases currently not addressed by the
> current design but that I would like to be have addressed somehow at
> some point in the future:
>
> A- Fault tolerance and handling missing results caused by evaluation errors
>
> How to handle partial results? Sometimes some combinations of the
> parameters will trigger runtime errors, for instance if the evaluation
> raises an exception if the estimator fails to convergence
> (ill-conditioning) or numeric overflow / underflow (apparently this
> can happen in our SGD cython code and raises a ValueError,
> to be debugged) or memory error...
>
> I think the whole search should not crash if one evaluation fails
> after 3 hours of computation and many successful evaluations. The
> error should be collected and the evaluation iteration should be
> excluded from the final results statistics.
>
> B- Being able to monitor partial results and interrupt search before
> waiting for the end (e.g. by handling KeyBoardInterrupt using an async
> job scheduling API)
>
> Also, even if the current joblib API does not allow for that, I think
> it would be very useful to make it possible at some point to allow the
> user to monitor the current progress in the search and allow him to
> interrupt it without loosing access to the evaluation results
> collected up to that point.
>
> C- Being able to warm-start a search with previously collected results
>
> C1: Refining the search space: Submit a new grid or parameter sampler
> that focus the search at a finer scale around an interesting area in
> existing dimensions and optionally trim dimensions that are deemed
> useless by the user according to the past results.
>
> C2: Refining the cross-validation: the user might want to perform a
> first search with very low number of CV (e.g. 1 or 2 iterations of
> shuffle split) to have a coarse overview of the interesting part of
> the search space, then trim the parameter grid to a smaller yet
> promising grid and then add more CV iterations only for those
> parameters so as to be able get finer estimates of the mean validation
> scores by reducing the standard error of the mean across random CV
> folds.
>
> Note: C2 is only useful for the (Stratified)ShuffleSplit cross
> validation where you can grow n_iter or change random_state to get as
> many CV split as you want provided the dataset is large enough.
>
> In order to be able to address A, B and C in the future, I think the
> estimator object should adopt a simple primary datastructure that is a
> growable list of individual  (parameter, CV-fold)-scoped evaluations
> and then provide the user with methods to simply introspect the, such
> as: find the top 10 parameters by average validation scores across
> currently available CV fold (some CV fold could be missing due to some
> partial evaluation caused by A (failures) or B (interrupted
> computation)).
>
> Each item in this list could have:
>
> - parameters_id: unique parameter set integer identifier (e.g. a deep
> hash or random index)
> - parameters: the parameter settings dict
> - cv_id: unique CV object integer identifier (hash of the of the CV
> object or random index)
> - cv_iter_index: the CV fold iteration integer index
> - validation_score_name: the primary validation score (to be used for
> ranking models)
>
> Optional attributes we could add in the future:
>
> - training score to be able to estimate under-fitting (if non-zero)
> and over-fitting by diffing with the validation score
> - more training an validation scores (e.g. precision, recall, AUC...)
> - more evaluation metrics that are not scores by useful for model
> analysis (e.g. a confusion matrix for classifiaction)
> - fitting time
> - prediction time (could be complicate to separate out of the complete
> scoring time due to our Scorer API that currently hides it).
>
> Then to compute the mean score for a given parameter sets one could
> group-by parameters_id (e.g. using a python `defaultdict(list)` with
> parameter_id as key).
> Advanced users could also convert this log of evaluation as a pandas
> dataframe and then do joins / group-by themselves to compute various
> aggregate statistics across the dimensions of there choice.
>
> Finally there is an additional use case that I have in mind even if
> possibly less a priority than the other:
>
> D: warm starting with larger subsamples of the dataset
>
> Make it possible to start the search on a small sub sample of the
> datasets (e.g. 10% of the complete dataset) , then with a larger
> subset (e.g. with 20% of the dataset) to be able to identify the most
> promising parameterization quickly and evaluate how sensitive they are
> sensitive to a doubling of the dataset size. That would make it
> possible to select a smaller grid for a parameter search on the full
> dataset and also being able to compute learning curves for
> bias-variance analysis of the individual parameters.
>
> --
> Olivier
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to