Re: [Scikit-learn-general] How to present parameter search results

Olivier Grisel Sun, 09 Jun 2013 09:20:07 -0700

2013/6/9 Joel Nothman <jnoth...@student.usyd.edu.au>:
> Thanks, Olivier. Those are some interesting use-cases:
>
>> A- Fault tolerance and handling missing results caused by evaluation
>> errors
>
> I don't think this affects the output format, except where we can actually
> get partial results for a fold, or if we want to report successful folds and
> ignore others for a single candidate parameter setting. But I wonder if that
> just makes things much too complicated.


It's not complicated to store successful results in a list and failed
parameters + matching error tracebacks in another.

The log of successful evaluation could either be a list of dicts of a
list of namedtuples. The list of dicts optional is probably more
flexible if we want to make it possible the user to collect additional
evaluation attribute by passing a callback for instance.

>> B: Being able to monitor partial results and interrupt search before
>> waiting for the end (e.g. by handling KeyBoardInterrupt using an async job
>> scheduling API)
>
> So the stop and resume case just means the results need to be appendable...?

Yes mostly. But that also mean that we should be able to compute mean
scores over 2 out 5 folds and then recompute the mean scores latter
when we get access to the 5 folds results.

Hence my proposal is to store the raw dummy list of evaluations and
offer public methods to compute aggregate user friendly summaries of
the partial or complete results.

> In general, I don't think Parallel's returning a list is of great benefit
> here. Working with an iterable would be more comfortable.

Yes we might need to make joblib.Parallel evolve to support task
submission and async retrieval to implement this. I think this is one
of the possible design goal envisioned by Gael as possible evolution
of the joblib project.

>> C1: Refining the search space
>
> Similarly, it should be possible to have fit append further results.

Yes.

>> C2: Refining the cross-validation
> and
>> D: warm starting with larger subsamples of the dataset
>
> I would think in these cases it's better to create a new estimator and/or
> keep results separate.

Although I think those are two very important to manage the
exploration / exploitation trade-off faced by the ML researches and
practicionners, I also agree they could be addressed in later
evolution scikit-learn or even maybe as separate projects as
https://github.com/jaberg/hyperopt or
https://github.comm/pydata/pyrallel

I would just like to emphasize that storing the raw evaluations logs
as a dummy python list would make it possible to deal with this kind
of future evoluations if we ever decide to implement them directly in
scikit-learn.

Hence I think that data structure that stores the evaluations results
should be as simple as possible and avoid making any assumptions on
the kind of aggregation or the number of axis we will collect during
the search.

Basically adding support for sub-sampling will add a new axis for
possible aggregations and if we use 2D numpy rec-arrays as the primary
datastructure with 1 row per parameter settings we won't be able to
implement that use case at all without breaking the API once again.

>> Optional attributes we could add in the future:
>
> Something you missed: the ability to get back diagnostics on the quality /
> complexity of the model, e.g. coefficient sparsity.

Yes. I think we could extend the fit_grid_point API to make it
possible to pass an arbitrary python callback that would have access
to the fitted estimator and the CV fold and collect any kind of
additional model properties to be included in the search report.

> These suggestions do make me consider storage in an external database (a
> blob store, or an online spreadsheet) as hyperopt allows. I think "allows"
> is important here: when you get to that scale of experimentation, you
> probably don't want results logged only in memory. But we need a sensible
> default for working with a few thousand candidates.

I agree, but I think we should keep that thread

> Except for purity of parallelism, I don't see why you would want do store
> each fold result for a single candidate separately. I don't see the use-case
> for providing them separately to the user (except where one fold failed and
> another succeeded).

To make it easy to:

- deal with partial / incomplete results (either for fault tolerance
or early stopping / monitoring)

- extend the size of an existing dimension (e.g. collecting 5 random
CV folds instead of 3) in a warm restart of the search.

- add a new dimension (e.g. subsamples of the dataset), possibly in
warm restart of the search instance.

by not making any assumptions on the kind of estimates the user will
want in the future of the lib.

> As far as I'm concerned, the frontend should hide that.

Yes that's why I propose to provide public methods to compute
interesting aggregates from the raw evaluation log.

> I do see that providing all fields together for a single candidate is the
> most common use-case and argues against providing parallel arrays (but not
> against a structured array / recarray).

structured array / recarray have 2 issues:

- they badly handle missing / partial results or at least there is not
uniform solution as missing data markers would depend on the dtype of
the column, e.g.: NaNs for floats, -1 as a marker for ints, None for
dtype=object? Furthermore missing results are pre-allocated.

- they do not naturally handle change in dimension sizes or number of
dimensions:

> Finally, the single most important thing I can see about making results
> explorable is not providing candidate parameter settings only as dicts, but
> splitting the dicts out so that you can query by the value of each
> parameter, and group over others.

Yes but if we go for the simple evaluation log list I propose, this
can be always be implemented provided by dedicated methods.

Furthermore be aware that the number of parameters is not always the
same for each result item of a GridSearchCV:

See: http://scikit-learn.org/stable/modules/grid_search.html#gridsearchcv

This is a valid param grid:

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

The gamma attribute is only present when `kernel == 'rbf'`.

Expanding this in column of a rec array is not very natural I think.
This is similar to the sparsity issue mentioned earlier.

> This may be getting into crazy land, and certainly close to reimplementing
> Pandas for the 2d case, or recarrays with benefits, but: imagine we had a
> SearchResult object with:
> * attributes like fold_test_score, fold_train_score, fold_train_time, each a
> 2d array.
> * __getattr__ magic that produced mean_test_score, mean_train_time, etc. and
> std_test_score, std_train_time on demand (weighted by some samples_per_fold
> attr if iid=True).
> * attributes like param_C that would enable selecting certain candidates by
> their parameter settings (through numpy-style boolean queries).
> * __getitem__ that can pull out one or more candidates by index (and returns
> a SearchResult).
> * a method that return a dict of selected 1d array attributes for
> Pandas-style (or spreadsheet? in that case a list of dicts) integration
> * a method that zips over selected attributes for simple iteration.
>
> Is this crazy, or does it do exactly what we want? or both? And how does it
> not meet the needs of your wishlist, Olivier (except where the number of
> folds differ)?

Interesting but I am not sure I understand it all. Can you give an
example of a typical series of instructions that would leverage such a
SearchResult object from an interactive python sessions to introspect
it?

Furthermore, such a SeachResult instance could always be computed on
demand or at the end of the computation from the raw evaluation log.
Or even wrap the raw evaluation log internally.

Basically I am advocating Event Sourcing [1] as a design goal for the
primary datastructure to store the evaluation results. Let us make as
few assumptions as possible on the kind of data we want to collect and
who the user will aggregate those data to find the best models.

[1] http://martinfowler.com/eaaDev/EventSourcing.html

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] How to present parameter search results

Reply via email to