Re: [Scikit-learn-general] GPs in sklearn

Kyle Kastner Tue, 25 Nov 2014 09:51:15 -0800

France is nice and probably a lot warmer than "new France" right now :)

The HODLR (Hierarchical Off Diagonal Low Rank approximation) solver is a
low rank approximation technique that allows you to use a Shannon-Woodbury
update (lots of Google references) for efficiently updating the inverse of
the GP [paper here http://arxiv.org/pdf/1403.5337v2.pdf] . GEORGE was
actually the implementation I was looking at to try and copy! I seemed
pretty good to me :)

This leads to computational complexity O(n log2(n)) for the update part,
where a naive inverse would be O(n^3). sklearn's current implementation
seems to fall somewhere between these 2 if I remember right. though I can't
recall the exact method used.

Really, there are a lot of recent methods for "efficient" GPs that are all
centered around matrix approximation and/or representing data clusters with
latent variables - I feel like *one* of them should be in sklearn but it
will probably be a tradeoff of implementation complexity vs. speed. But the
speed improvements are possibly too big to ignore - thinking of the major
usage that random forests now have, a lot of that was due to handling lots
of data well. I can see a few cases where handling ~10k to 100k datapoints
could make GPs generally useable for timeseries work.

Kernel engineering seems useful at least to me, and recent work has started
to automate that in many ways. See Zhoubin Gharamani's recent work on the
"Automated Statistician" or Andrew Wilson's GPatt kernel. These are both
ways to make the exploration of data easier (read, automated) for our
users, which could be very useful. But that is just IMO

I think current lack of adoption is largely a combination of strangeness in
the API and not enough documentation for how to do "normal" GP things. This
example in particular is quite hard to follow
http://scikit-learn.org/stable/auto_examples/gaussian_process/gp_diabetes_dataset.html#example-gaussian-process-gp-diabetes-dataset-py
. A lot of mathematical names for internal variables, coupled with a
somewhat different API (returning MSE seems weird to me - why not store
somewhere of have a separate getter?).

Also, I really, really, really hate the term "nugget".

On Tue, Nov 25, 2014 at 12:37 PM, Andy <t3k...@gmail.com> wrote:

> There are definitely API questions that I also just discusses with Dan.
> There are some thing that we could improve, but I think the solutions
> depend a lot on if we want to do kernel engineering or not.
> My thinking was that this part is probably the most controversial one,
> so this is what I asked about.
>
> I am not sure how useful our current implementation is. I have not met a
> person that uses it.
> That might be caused by the interface and missing documentation more
> than by the fact that we don't have custom kernels, though.
>
> I don't think we want to add a HODLR solver (hierarchical off-diagonal
> low-rank solver).
> It you want that, you should probably install gorge ;)
>
> What I think would be great to have is gradient based optimization of
> the kernel parameters, and a way to do grid-search
> using data likelihood (not sure if that is currently supported).
>
> For the API, the naming is somewhat non-standard, it is not super clear
> what the parameters mean, and it is also not super clear
> whether the kernel-parameters will be optimized for a given parameters
> setting.
>
>
>
>
> On 11/25/2014 12:28 PM, Gael Varoquaux wrote:
> > On Tue, Nov 25, 2014 at 12:23:50PM -0500, Kyle Kastner wrote:
> >> specifically a HODLR solver.
> > What is this. Can you tell us more?
> >
> >> One very specific reason to focus on GP code quality would be that it
> >> opens the door to use sklearn's own code to implement some very nice
> >> hyperparameter search algorithms which could be useful to many users.
> > Yes. I pretty badly would like this (Kyle, are you looking for an
> > internship? You know, France is nice :P).
> >
> > Gaël
> >
> >
> ------------------------------------------------------------------------------
> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> > from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> > with Interactivity, Sharing, Native Excel Exports, App Integration & more
> > Get technology previously reserved for billion-dollar corporations, FREE
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GPs in sklearn

Reply via email to