Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/16222#discussion_r91823013 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -768,8 +768,46 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2))) head(predict(isoregModel, newDF)) ``` -#### What's More? -We also expect Decision Tree, Random Forest, Kolmogorov-Smirnov Test coming in the next version 2.1.0. +### Logistic Regression Model + +(Added in 2.1.0) + +[Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) is a widely-used model when the response is categorical. It can be seen as a special case of the [Generalized Linear Model](https://en.wikipedia.org/wiki/Generalized_linear_model). +There are two types of logistic regression models, namely binomial logistic regression (i.e., response is binary) and multinomial +logistic regression (i.e., response falls into multiple classes). We provide `spark.logit` on top of `spark.glm` to support logistic regression with advanced hyper-parameters. +It supports both binary and multiclass classification, elastic-net regularization, and feature standardization, similar to `glmnet`. + + +`spark.logit` fits an Logistic Regression Model against a Spark DataFrame. The `family` parameter can be used to select between the +binomial and multinomial algorithms, or leave it unset and Spark will infer the correct variant. + +We use a simple example to demonstrate `spark.logit` usage. In general, there are three steps of using `spark.logit`: +1). Create a dataframe from proper data source; 2). Fit a logistic regression model using `spark.logit` with a proper parameter setting; +and 3). Obtain the coefficient matrix of the fitted model using `summary` and use the model for prediction with `predict`. + +Binomial logistic regression +```{r, warning=FALSE} +df <- createDataFrame(iris) +# Create a dataframe containing two classes +training <- df[df$Species %in% c("versicolor", "virginica"), ] +model <- spark.logit(training, Species ~ ., regParam = 0.5) +summary <- summary(model) --- End diff -- Unfortunately, we didn't implement `print.summary`. If `summary(model)` is still somewhat human-readable, we should use it. Once we implemented `print.summary`, we don't need to change code here.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org