Re: [MLlib] PCA Aggregator

2018-10-19 Thread Matt Saunders
en project, if you develop a third party library > > On Fri, Oct 19, 2018, 2:32 PM Matt Saunders wrote: > >> Thanks, Eric. I went ahead and created SPARK-25782 for this improvement >> since it is a feature I and others have looked for in MLlib, but doesn't >> seem t

Re: [MLlib] PCA Aggregator

2018-10-19 Thread Matt Saunders
or it, and then a pull > request. Another possibility is to publish it as your own 3rd party > library, which I have done for aggregators before. > > > On Wed, Oct 17, 2018 at 4:54 PM Matt Saunders wrote: > >> I built an Aggregator that computes PCA on grouped datasets. I wa

[MLlib] PCA Aggregator

2018-10-17 Thread Matt Saunders
I built an Aggregator that computes PCA on grouped datasets. I wanted to use the PCA functions provided by MLlib, but they only work on a full dataset, and I needed to do it on a grouped dataset (like a RelationalGroupedDataset). So I built a little Aggregator that can do that, here’s an example o