[GitHub] spark pull request #13285: [Spark-15129][R][DOC]R API changes in ML

yanboliang Sun, 05 Jun 2016 11:55:17 -0700

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13285#discussion_r65822439
  
    --- Diff: docs/sparkr.md ---
    @@ -285,71 +285,28 @@ head(teenagers)
     
     # Machine Learning
     
    -SparkR allows the fitting of generalized linear models over DataFrames 
using the [glm()](api/R/glm.html) function. Under the hood, SparkR uses MLlib 
to train a model of the specified family. Currently the gaussian and binomial 
families are supported. We support a subset of the available R formula 
operators for model fitting, including '~', '.', ':', '+', and '-'.
    +SparkR supports the following Machine Learning algorithms.
     
    -The [summary()](api/R/summary.html) function gives the summary of a model 
produced by [glm()](api/R/glm.html).
    +* Generalized Linear Regression Model [spark.glm()](api/R/glm.html)
    +* Naive Bayes [spark.naiveBayes()](api/R/naiveBayes.html)
    +* KMeans [spark.kmeans()](api/R/kmeans.html)
    +* AFT Survival Regression [spark.survreg()](api/R/survreg.html)
     
    -* For gaussian GLM model, it returns a list with 'devianceResiduals' and 
'coefficients' components. The 'devianceResiduals' gives the min/max deviance 
residuals of the estimation; the 'coefficients' gives the estimated 
coefficients and their estimated standard errors, t values and p-values. (It 
only available when model fitted by normal solver.)
    -* For binomial GLM model, it returns a list with 'coefficients' component 
which gives the estimated coefficients.
    +Generalized Linear Regression can be used to train a model from a 
specified family. Currently the Gaussian, Binomial, Poisson and Gamma families 
are supported. We support a subset of the available R formula operators for 
model fitting, including '~', '.', ':', '+', and '-'.
     
    -The examples below show the use of building gaussian GLM model and 
binomial GLM model using SparkR.
    +The [summary()](api/R/summary.html) function gives the summary of a model 
produced by different algorithms listed above.
    +This summary is same as the result of summary() function in R.
     
    -## Gaussian GLM model
    +## Model persistence
     
    -<div data-lang="r"  markdown="1">
    -{% highlight r %}
    -# Create the DataFrame
    -df <- createDataFrame(sqlContext, iris)
    -
    -# Fit a gaussian GLM model over the dataset.
    -model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = 
"gaussian")
    -
    -# Model summary are returned in a similar format to R's native glm().
    -summary(model)
    -##$devianceResiduals
    -## Min       Max     
    -## -1.307112 1.412532
    -##
    -##$coefficients
    -##                   Estimate  Std. Error t value  Pr(>|t|)    
    -##(Intercept)        2.251393  0.3697543  6.08889  9.568102e-09
    -##Sepal_Width        0.8035609 0.106339   7.556598 4.187317e-12
    -##Species_versicolor 1.458743  0.1121079  13.01195 0           
    -##Species_virginica  1.946817  0.100015   19.46525 0           
    -
    -# Make predictions based on the model.
    -predictions <- predict(model, newData = df)
    -head(select(predictions, "Sepal_Length", "prediction"))
    -##  Sepal_Length prediction
    -##1          5.1   5.063856
    -##2          4.9   4.662076
    -##3          4.7   4.822788
    -##4          4.6   4.742432
    -##5          5.0   5.144212
    -##6          5.4   5.385281
    -{% endhighlight %}
    -</div>
    +* write.ml allows users to save a fitted model in a given input path
    --- End diff --
    
    ```[write.ml](api/R/write.ml.html)``` and ditto for ```read.ml```.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13285: [Spark-15129][R][DOC]R API changes in ML

Reply via email to