Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13285#discussion_r65221434
--- Diff: docs/sparkr.md ---
@@ -285,71 +285,57 @@ head(teenagers)
# Machine Learning
-SparkR allows the fitting of generalized linear models over DataFrames
using the [glm()](api/R/glm.html) function. Under the hood, SparkR uses MLlib
to train a model of the specified family. Currently the gaussian and binomial
families are supported. We support a subset of the available R formula
operators for model fitting, including '~', '.', ':', '+', and '-'.
+SparkR supports the following Machine Learning algorithms.
-The [summary()](api/R/summary.html) function gives the summary of a model
produced by [glm()](api/R/glm.html).
+* Generalized Linear Regression Model [glm()](api/R/glm.html)
+* Naive Bayes [naiveBayes()](api/R/naiveBayes.html)
+* KMeans [kmeans()](api/R/kmeans.html)
+* AFT Survival Regression [survreg()](api/R/survreg.html)
-* For gaussian GLM model, it returns a list with 'devianceResiduals' and
'coefficients' components. The 'devianceResiduals' gives the min/max deviance
residuals of the estimation; the 'coefficients' gives the estimated
coefficients and their estimated standard errors, t values and p-values. (It
only available when model fitted by normal solver.)
-* For binomial GLM model, it returns a list with 'coefficients' component
which gives the estimated coefficients.
+Under the hood, SparkR uses MLlib to train a model of the specified
family. Currently the gaussian, binomial, Poisson and Gamma families are
supported. We support a subset of the available R formula operators for model
fitting, including '~', '.', ':', '+', and '-'.
-The examples below show the use of building gaussian GLM model and
binomial GLM model using SparkR.
+The [summary()](api/R/summary.html) function gives the summary of a model
produced by different algorithms listed above.
+This summary is same as the result of summary() function in R.
-## Gaussian GLM model
+## Model persistence
-<div data-lang="r" markdown="1">
-{% highlight r %}
-# Create the DataFrame
-df <- createDataFrame(sqlContext, iris)
+* write.ml allows users to save a fitted model in a given input path
+* read.ml allows users to read/load the model which was saved using
write.ml
+
+Model persistence is supported for all Machine Learning algorithms for all
families.
-# Fit a gaussian GLM model over the dataset.
-model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family =
"gaussian")
+The examples below show the use of building Gaussian GLM, NaiveBayes,
kMeans and AFTSurvivalReg using SparkR
+{% include_example r/ml.r %}
+
+# GLM Summary() Result
+
+Here is an example of the output from the summary() function for GLM
+
+{% highlight r %}
# Model summary are returned in a similar format to R's native glm().
summary(model)
-##$devianceResiduals
-## Min Max
-## -1.307112 1.412532
-##
-##$coefficients
-## Estimate Std. Error t value Pr(>|t|)
-##(Intercept) 2.251393 0.3697543 6.08889 9.568102e-09
-##Sepal_Width 0.8035609 0.106339 7.556598 4.187317e-12
-##Species_versicolor 1.458743 0.1121079 13.01195 0
-##Species_virginica 1.946817 0.100015 19.46525 0
-
-# Make predictions based on the model.
-predictions <- predict(model, newData = df)
-head(select(predictions, "Sepal_Length", "prediction"))
-## Sepal_Length prediction
-##1 5.1 5.063856
-##2 4.9 4.662076
-##3 4.7 4.822788
-##4 4.6 4.742432
-##5 5.0 5.144212
-##6 5.4 5.385281
-{% endhighlight %}
-</div>
+##Deviance Residuals:
--- End diff --
+1 Since summary output is different for different models, it makes sense
to remove it. I will go ahead and remove.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]