[GitHub] spark pull request #16222: [SPARK-18797][SparkR]:Update spark.logit in spark...

mengxr Fri, 09 Dec 2016 17:09:20 -0800

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16222#discussion_r91823013
  
    --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
    @@ -768,8 +768,46 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2)))
     head(predict(isoregModel, newDF))
     ```
     
    -#### What's More?
    -We also expect Decision Tree, Random Forest, Kolmogorov-Smirnov Test 
coming in the next version 2.1.0.
    +### Logistic Regression Model
    +
    +(Added in 2.1.0)
    +
    +[Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) 
is a widely-used model when the response is categorical. It can be seen as a 
special case of the [Generalized Linear 
Model](https://en.wikipedia.org/wiki/Generalized_linear_model).
    +There are two types of logistic regression models, namely binomial 
logistic regression (i.e., response is binary) and multinomial
    +logistic regression (i.e., response falls into multiple classes). We 
provide `spark.logit` on top of `spark.glm` to support logistic regression with 
advanced hyper-parameters.
    +It supports both binary and multiclass classification, elastic-net 
regularization, and feature standardization, similar to `glmnet`.
    +
    +
    +`spark.logit` fits an Logistic Regression Model against a Spark DataFrame. 
The `family` parameter can be used to select between the
    +binomial and multinomial algorithms, or leave it unset and Spark will 
infer the correct variant.
    +
    +We use a simple example to demonstrate `spark.logit` usage. In general, 
there are three steps of using `spark.logit`:
    +1). Create a dataframe from proper data source; 2). Fit a logistic 
regression model using `spark.logit` with a proper parameter setting;
    +and 3). Obtain the coefficient matrix of the fitted model using `summary` 
and use the model for prediction with `predict`.
    +
    +Binomial logistic regression
    +```{r, warning=FALSE}
    +df <- createDataFrame(iris)
    +# Create a dataframe containing two classes
    +training <- df[df$Species %in% c("versicolor", "virginica"), ]
    +model <- spark.logit(training, Species ~ ., regParam = 0.5)
    +summary <- summary(model)
    --- End diff --
    
    Unfortunately, we didn't implement `print.summary`. If `summary(model)` is 
still somewhat human-readable, we should use it. Once we implemented 
`print.summary`, we don't need to change code here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16222: [SPARK-18797][SparkR]:Update spark.logit in spark...

Reply via email to