Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2047#discussion_r16453381
--- Diff: docs/mllib-linear-methods.md ---
@@ -518,6 +518,80 @@ print("Mean Squared Error = " + str(MSE))
</div>
</div>
+## Streaming linear regression
+
+When data arrive in a streaming fashion, it is useful to fit regression
models online,
+updating the parameters of the model as new data arrive. MLlib currently
supports
+streaming linear regression using ordinary least squares. The fitting is
similar
+to that performed offline, except fitting occurs on each batch of data, so
that
+the model continually updates to reflect the data from the stream.
+
+### Examples
+
+The following example demonstrates how to load training and testing data
from two different
+input streams of text files, parse the streams as labeled points, fit a
linear regression model
+online to the first stream, and make predictions on the second stream.
+
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+
+First, we import the necessary classes for parsing our input data and
creating the model.
+
+{% highlight scala %}
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
+
+{% endhighlight %}
+
+Then we make input streams for training and testing data. We assume a
Streaming Context `ssc`
--- End diff --
`Streaming Context` -> `StreamingContext` or `streaming context`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]