Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-11-02 Thread YiZhi Liu
Hi Tsai,

Is it proper if I create a jira and try to work on it?

2015-10-23 10:40 GMT+08:00 YiZhi Liu <javeli...@gmail.com>:
> Thank you Tsai.
>
> Holden, would you mind posting the JIRA issue id here? I searched but
> found nothing. Thanks.
>
> 2015-10-23 1:36 GMT+08:00 DB Tsai <dbt...@dbtsai.com>:
>> There is a JIRA for this. I know Holden is interested in this.
>>
>>
>> On Thursday, October 22, 2015, YiZhi Liu <javeli...@gmail.com> wrote:
>>>
>>> Would someone mind giving some hint?
>>>
>>> 2015-10-20 15:34 GMT+08:00 YiZhi Liu <javeli...@gmail.com>:
>>> > Hi all,
>>> >
>>> > I noticed that in ml.classification.LogisticRegression, users are not
>>> > allowed to set initial coefficients, while it is supported in
>>> > mllib.classification.LogisticRegressionWithSGD.
>>> >
>>> > Sometimes we know specific coefficients are close to the final optima.
>>> > e.g., we usually pick yesterday's output model as init coefficients
>>> > since the data distribution between two days' training sample
>>> > shouldn't change much.
>>> >
>>> > Is there any concern for not supporting this feature?
>>> >
>>> > --
>>> > Yizhi Liu
>>> > Senior Software Engineer / Data Mining
>>> > www.mvad.com, Shanghai, China
>>>
>>>
>>>
>>> --
>>> Yizhi Liu
>>> Senior Software Engineer / Data Mining
>>> www.mvad.com, Shanghai, China
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>
>>
>> --
>> - DB
>>
>> Sent from my iPhone
>
>
>
> --
> Yizhi Liu
> Senior Software Engineer / Data Mining
> www.mvad.com, Shanghai, China



-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark Implementation of XGBoost

2015-10-26 Thread YiZhi Liu
There's an xgboost exploration jira SPARK-8547. Can it be a good start?

2015-10-27 7:07 GMT+08:00 DB Tsai <dbt...@dbtsai.com>:
> Also, does it support categorical feature?
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
>
> On Mon, Oct 26, 2015 at 4:06 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>> Interesting. For feature sub-sampling, is it per-node or per-tree? Do
>> you think you can implement generic GBM and have it merged as part of
>> Spark codebase?
>>
>> Sincerely,
>>
>> DB Tsai
>> --
>> Web: https://www.dbtsai.com
>> PGP Key ID: 0xAF08DF8D
>>
>>
>> On Mon, Oct 26, 2015 at 11:42 AM, Meihua Wu
>> <rotationsymmetr...@gmail.com> wrote:
>>> Hi Spark User/Dev,
>>>
>>> Inspired by the success of XGBoost, I have created a Spark package for
>>> gradient boosting tree with 2nd order approximation of arbitrary
>>> user-defined loss functions.
>>>
>>> https://github.com/rotationsymmetry/SparkXGBoost
>>>
>>> Currently linear (normal) regression, binary classification, Poisson
>>> regression are supported. You can extend with other loss function as
>>> well.
>>>
>>> L1, L2, bagging, feature sub-sampling are also employed to avoid 
>>> overfitting.
>>>
>>> Thank you for testing. I am looking forward to your comments and
>>> suggestions. Bugs or improvements can be reported through GitHub.
>>>
>>> Many thanks!
>>>
>>> Meihua
>>>
>>> -----
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>



-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-10-22 Thread YiZhi Liu
Thank you Tsai.

Holden, would you mind posting the JIRA issue id here? I searched but
found nothing. Thanks.

2015-10-23 1:36 GMT+08:00 DB Tsai <dbt...@dbtsai.com>:
> There is a JIRA for this. I know Holden is interested in this.
>
>
> On Thursday, October 22, 2015, YiZhi Liu <javeli...@gmail.com> wrote:
>>
>> Would someone mind giving some hint?
>>
>> 2015-10-20 15:34 GMT+08:00 YiZhi Liu <javeli...@gmail.com>:
>> > Hi all,
>> >
>> > I noticed that in ml.classification.LogisticRegression, users are not
>> > allowed to set initial coefficients, while it is supported in
>> > mllib.classification.LogisticRegressionWithSGD.
>> >
>> > Sometimes we know specific coefficients are close to the final optima.
>> > e.g., we usually pick yesterday's output model as init coefficients
>> > since the data distribution between two days' training sample
>> > shouldn't change much.
>> >
>> > Is there any concern for not supporting this feature?
>> >
>> > --
>> > Yizhi Liu
>> > Senior Software Engineer / Data Mining
>> > www.mvad.com, Shanghai, China
>>
>>
>>
>> --
>> Yizhi Liu
>> Senior Software Engineer / Data Mining
>> www.mvad.com, Shanghai, China
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>
>
> --
> - DB
>
> Sent from my iPhone



-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-10-22 Thread YiZhi Liu
Would someone mind giving some hint?

2015-10-20 15:34 GMT+08:00 YiZhi Liu <javeli...@gmail.com>:
> Hi all,
>
> I noticed that in ml.classification.LogisticRegression, users are not
> allowed to set initial coefficients, while it is supported in
> mllib.classification.LogisticRegressionWithSGD.
>
> Sometimes we know specific coefficients are close to the final optima.
> e.g., we usually pick yesterday's output model as init coefficients
> since the data distribution between two days' training sample
> shouldn't change much.
>
> Is there any concern for not supporting this feature?
>
> --
> Yizhi Liu
> Senior Software Engineer / Data Mining
> www.mvad.com, Shanghai, China



-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Ability to offer initial coefficients in ml.LogisticRegression

2015-10-20 Thread YiZhi Liu
Hi all,

I noticed that in ml.classification.LogisticRegression, users are not
allowed to set initial coefficients, while it is supported in
mllib.classification.LogisticRegressionWithSGD.

Sometimes we know specific coefficients are close to the final optima.
e.g., we usually pick yesterday's output model as init coefficients
since the data distribution between two days' training sample
shouldn't change much.

Is there any concern for not supporting this feature?

-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

2015-10-12 Thread YiZhi Liu
Hi Joseph,

Thank you for clarifying the motivation that you setup a different API
for ml pipelines, it sounds great. But I still think we could extract
some common parts of the training & inference procedures for ml and
mllib. In ml.classification.LogisticRegression, you simply transform
the DataFrame into RDD and follow the same procedures in
mllib.optimization.{LBFGS,OWLQN}, right?

My suggestion is, if I may, ml package should focus on the public API,
and leave the underlying implementations, e.g. numerical optimization,
to mllib package.

Please let me know if my understanding has any problem. Thank you!

2015-10-08 1:15 GMT+08:00 Joseph Bradley <jos...@databricks.com>:
> Hi YiZhi Liu,
>
> The spark.ml classes are part of the higher-level "Pipelines" API, which
> works with DataFrames.  When creating this API, we decided to separate it
> from the old API to avoid confusion.  You can read more about it here:
> http://spark.apache.org/docs/latest/ml-guide.html
>
> For (3): We use Breeze, but we have to modify it in order to do distributed
> optimization based on Spark.
>
> Joseph
>
> On Tue, Oct 6, 2015 at 11:47 PM, YiZhi Liu <javeli...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> I'm curious about the difference between
>> ml.classification.LogisticRegression and
>> mllib.classification.LogisticRegressionWithLBFGS. Both of them are
>> optimized using LBFGS, the only difference I see is LogisticRegression
>> takes DataFrame while LogisticRegressionWithLBFGS takes RDD.
>>
>> So I wonder,
>> 1. Why not simply add a DataFrame training interface to
>> LogisticRegressionWithLBFGS?
>> 2. Whats the difference between ml.classification and
>> mllib.classification package?
>> 3. Why doesn't ml.classification.LogisticRegression call
>> mllib.optimization.LBFGS / mllib.optimization.OWLQN directly? Instead,
>> it uses breeze.optimize.LBFGS and re-implements most of the procedures
>> in mllib.optimization.{LBFGS,OWLQN}.
>>
>> Thank you.
>>
>> Best,
>>
>> --
>> Yizhi Liu
>> Senior Software Engineer / Data Mining
>> www.mvad.com, Shanghai, China
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>



-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

2015-10-07 Thread YiZhi Liu
Hi everyone,

I'm curious about the difference between
ml.classification.LogisticRegression and
mllib.classification.LogisticRegressionWithLBFGS. Both of them are
optimized using LBFGS, the only difference I see is LogisticRegression
takes DataFrame while LogisticRegressionWithLBFGS takes RDD.

So I wonder,
1. Why not simply add a DataFrame training interface to
LogisticRegressionWithLBFGS?
2. Whats the difference between ml.classification and
mllib.classification package?
3. Why doesn't ml.classification.LogisticRegression call
mllib.optimization.LBFGS / mllib.optimization.OWLQN directly? Instead,
it uses breeze.optimize.LBFGS and re-implements most of the procedures
in mllib.optimization.{LBFGS,OWLQN}.

Thank you.

Best,

-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org