Hi,
After searching the machine learning library for streaming algorithms, I
found two that fit the criteria: Streaming Linear Regression
(https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression)
and Streaming K-Means
(https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means).
However, both use the RDD-based API MLlib instead of the DataFrame-based
API ML; are there any plans for bringing them both to ML?
Also, is there any technical reason why there are so few incremental
algorithms on the machine learning library? There's only 1 algorithm for
regression and clustering each, with nothing for classification,
dimensionality reduction or feature extraction.
If there is a reason, how were those two algorithms implemented? If
there isn't, what is the general consensus on adding new online machine
learning algorithms?
Regards,
Lucas Chagas
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org