Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/16222#discussion_r91823358
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -768,8 +768,46 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2)))
head(predict(isoregModel, newDF))
```
-#### What's More?
-We also expect Decision Tree, Random Forest, Kolmogorov-Smirnov Test
coming in the next version 2.1.0.
+### Logistic Regression Model
+
+(Added in 2.1.0)
+
+[Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression)
is a widely-used model when the response is categorical. It can be seen as a
special case of the [Generalized Linear
Model](https://en.wikipedia.org/wiki/Generalized_linear_model).
+There are two types of logistic regression models, namely binomial
logistic regression (i.e., response is binary) and multinomial
+logistic regression (i.e., response falls into multiple classes). We
provide `spark.logit` on top of `spark.glm` to support logistic regression with
advanced hyper-parameters.
+It supports both binary and multiclass classification, elastic-net
regularization, and feature standardization, similar to `glmnet`.
+
+
+`spark.logit` fits an Logistic Regression Model against a Spark DataFrame.
The `family` parameter can be used to select between the
+binomial and multinomial algorithms, or leave it unset and Spark will
infer the correct variant.
+
+We use a simple example to demonstrate `spark.logit` usage. In general,
there are three steps of using `spark.logit`:
+1). Create a dataframe from a proper data source; 2). Fit a logistic
regression model using `spark.logit` with a proper parameter setting;
+and 3). Obtain the coefficient matrix of the fitted model using `summary`
and use the model for prediction with `predict`.
+
+Binomial logistic regression
+```{r, warning=FALSE}
+df <- createDataFrame(iris)
+# Create a dataframe containing two classes
+training <- df[df$Species %in% c("versicolor", "virginica"), ]
+model <- spark.logit(training, Species ~ ., regParam = 0.5)
+summary <- summary(model)
+head(summary)
+```
+
+Predict values on training data
+```{r}
+fitted <- predict(model, training)
+```
+
+Multinomial logistic regression against three classes
+```{r, warning=FALSE}
+df <- createDataFrame(iris)
+# Note family = "multinomial" is optional in this case since the dataset
has multiple classes.
--- End diff --
This reads like `family = "binomial"` is required if the dataset has only
two classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]