Re: Huber regression in PySpark?

2017-08-20 Thread Yanbo Liang
Hi Jeff,

Actually I have one implementation of robust regression with huber loss for
a long time (https://github.com/apache/spark/pull/14326). This is a fairly
straightforward porting for scikit-learn HuberRegressor.
The PR making huber regression as a separate Estimator, and we found it can
be merged into LinearRegression.
I will update this PR ASAP, and I'm looking forward your reviews and
comments.
After the Scala implementation is merged, it's very easy to add
corresponding PySpark API, then you can use it to train huber regression
model in the distributed environment.

Thanks
Yanbo

On Sun, Aug 20, 2017 at 3:19 PM, Jeff Gates <gatesa...@gmail.com> wrote:

> Hi guys,
>
> Is there huber regression in PySpark? We are using sklearn HuberRegressor (
> http://scikit-learn.org/stable/modules/generated/sklearn.
> linear_model.HuberRegressor.html) to train our model, but with some
> bottleneck in single node.
> If no, is there any obstacle to implement it in PySpark?
>
> Jeff
>


Huber regression in PySpark?

2017-08-20 Thread Jeff Gates
Hi guys,

Is there huber regression in PySpark? We are using sklearn HuberRegressor (
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
HuberRegressor.html) to train our model, but with some bottleneck in single
node.
If no, is there any obstacle to implement it in PySpark?

Jeff