[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-25 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17967 Merged into master, thanks for all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17967 yes I'd hold this for a day. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @felixcheung @yanboliang I'm fine with either the ascii table or the html table. It's your call. Hope to get over this minor doc issue and get this PR in soon. I can update the doc later

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17967 given that I think I'm ok with an ascii table as a one time thing. thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 This is what we get from the current doc: ![image](https://cloud.githubusercontent.com/assets/11082368/26430799/dd49fa4c-40a4-11e7-95c6-66def9a8f588.png) --- If your project is set

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 I tried using ``, but in Scaladoc, it is not correctly formatted. I tried a few other options, but it seems the html attributes are ignored in Scaladoc.

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17967 I think a html table is better? https://github.com/apache/spark/pull/17967#discussion_r117917444 + @srowen for your opinion- to be honest I don't think I've actually seen a table in Spark

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17967 Personally, I would prefer a HTML list or table one. But I am fine with the current status if this is okay to all of you here (as I guess none of them is particularly better given all the

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17967 @actuaryzhang Thanks for your clarification, it makes sense. This looks good to me. @HyukjinKwon @felixcheung What do you think of the documentation issue? --- If your project is set up for

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-22 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @yanboliang I updated the example in the param doc. I hope it is clear now that it is `alphabetDesc` that drops the same category as R. That is, RFormula with `alphabetDesc` drops the first

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77209/ Test PASSed. ---

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77209/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77209/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-22 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @felixcheung Is the html tag `` supported? Tried this but failed to compile... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-22 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @yanboliang I understand your points. The issue is `OneHotEncoder` only supports `dropLast`. The ideal solution to match R exactly (both the category dropped and ordering of feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-22 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17967 hmm, should we just use html ``? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77130/ Test PASSed. ---

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77130/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77130/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @HyukjinKwon @felixcheung I confirm it works for Javadoc. ![image](https://cloud.githubusercontent.com/assets/11082368/26277962/21dbe70e-3d46-11e7-978f-e422b9122e87.png) --- If your

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17967 (FWIW, `{{{ ... }}}` should work for Javadoc too given my past try - https://github.com/apache/spark/pull/15999#discussion_r89580586) --- If your project is set up for it, you can reply to

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77116/ Test PASSed. ---

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77116/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77116/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-20 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @felixcheung @HyukjinKwon Thanks much for pointing out the documentation issues. I still prefer to have a table to clearly illustrate what each option is doing. Made a new commit to

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77110/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77110/ Test PASSed. ---

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77110/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @yanboliang Thanks for the review and suggestion. Makes lots of sense. I made a new commit to address these. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @yanboliang Thanks for the question. The alphabetically ascending order in R is very convenient for display purpose. For example, when you do a summary of model results, the results

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77085/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77085/ Test PASSed. ---

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17967 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #77085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77085/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-19 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @viirya Great point. Added a comment to explain this in the doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @felixcheung Once this PR gets in, I'll update the SparkR side and include some test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17967 thanks for the example, I think that's very concrete that this change would be very useful --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #76913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76913/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76913/ Test PASSed. ---

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17967 **[Test build #76913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76913/testReport)** for PR 17967 at commit

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @felixcheung Thanks for the review. I fixed some typo. Below is an example to show the difference in model estimates due to different string ordering between R and RFormula.

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17967 cool - I think this is important to have. do you have a higher level example of the old/new model output as affected by the string ordering? --- If your project is set up for it, you can reply

[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-12 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @yanboliang @MLnick @HyukjinKwon @jkbradley @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not