Firstly, a note that I've added that example to the doctest on my branch,
with some extensions to show selecting over parameter values and grouping
over named fields (e.g. identifying the 'C' with the best result per
'degree').

I think hyperopt's use of mongodb (an alternatively) sounds a lot like what
you're proposing. The other case we should eventually support is finding
the best result while keeping no log whatsoever. In the meantime, I would
like to give users an interface to access more than just the score for the
full set of results; but yes, it could become merely an option for log
handling / analysis.

Regarding structured arrays:

> - they badly handle missing / partial results or at least there is not
> uniform solution as missing data markers would depend on the dtype of
> the column, e.g.: NaNs for floats, -1 as a marker for ints, None for
> dtype=object? Furthermore missing results are pre-allocated.

mrecarrays handle the masking issues, albeit providing a bit of a clumsy
interface<http://numpy-discussion.10968.n7.nabble.com/mrecarray-indexing-behaviour-td33532.html>.
I currently use such masking for missing parameters in cases like your:

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

It gets a bit messy, but selecting by parameter value still works as
expected. And yes, the preallocation is a bit of a problem; this takes up
unnecessary space, but generally not as much unnecessary space as a series
of dicts! (Admittedly array storage of string parameters is a bit wasteful
of memory when stored with dtype=np.string_ rather than dtype=object.)

> - they do not naturally handle change in dimension sizes or number of
dimensions:

No, they don't. My current solution does not handle changes in number of
folds / dimensions. It handles the subset of data with two dimensions of
the same size (with possibly-masked parameters and maybe results in the
future too). I think that's still pretty useful in most cases; and it could
perhaps have a different storage backend with the same frontend to handle
the heterogenous size case.

Btw, one thing I haven't implemented on SearchResult is an __array__ method
that returns a mrecarray of all parameters and result means and stds (where
dtypes allow), suitable for import into pandas or export to CSV.


On Mon, Jun 10, 2013 at 2:25 AM, Olivier Grisel <olivier.gri...@ensta.org>wrote:

> 2013/6/9 Joel Nothman <jnoth...@student.usyd.edu.au>:
> > Again, it's probably over the top, but I think it's a useful interface
> > (prototyped at
> > https://github.com/jnothman/scikit-learn/tree/search_results):
> >
> >>>> from __future__ import print_function
> >>>> from sklearn.grid_search import GridSearchCV
> >>>> from sklearn.datasets import load_iris
> >>>> from sklearn.svm import SVC
> >>>> iris = load_iris()
> >>>> grid = {'C': [0.01, 0.1, 1], 'degree': [1, 2, 3]}
> >>>> search = GridSearchCV(SVC(kernel='poly'),
> >>>> param_grid=grid).fit(iris.data, iris.target)
> >>>> res = search.results_
> >>>> res.best().mean_test_score
> > 0.97333333333333338
> >>>> res
> > <9 candidates. Best results:
> >   <0.973 for {'C': 0.10000000000000001, 'degree': 3}>,
> >   <0.967 for {'C': 1.0, 'degree': 3}>,
> >   <0.967 for {'C': 1.0, 'degree': 2}>, ...>
> >>>> for tup in res.zipped('parameters', 'mean_test_score',
> >>>> 'std_test_score'):
> > ...     print(*tup)
> > ...
> > {'C': 0.01, 'degree': 1} 0.673333333333 0.033993463424
> > {'C': 0.01, 'degree': 2} 0.926666666667 0.00942809041582
> > {'C': 0.01, 'degree': 3} 0.966666666667 0.0188561808316
> > {'C': 0.10000000000000001, 'degree': 1} 0.94 0.0163299316186
> > {'C': 0.10000000000000001, 'degree': 2} 0.966666666667 0.0188561808316
> > {'C': 0.10000000000000001, 'degree': 3} 0.973333333333 0.00942809041582
> > {'C': 1.0, 'degree': 1} 0.966666666667 0.0249443825785
> > {'C': 1.0, 'degree': 2} 0.966666666667 0.00942809041582
> > {'C': 1.0, 'degree': 3} 0.966666666667 0.0188561808316
>
> I very much like that but I still think that we should keep the raw
> evaluation log to make it easier to implement future extensions.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to