
After searching the machine learning library for streaming algorithms, I found two that fit the criteria: Streaming Linear Regression (https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression) and Streaming K-Means (https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means).

However, both use the RDD-based API MLlib instead of the DataFrame-based API ML; are there any plans for bringing them both to ML?

Also, is there any technical reason why there are so few incremental algorithms on the machine learning library? There's only 1 algorithm for regression and clustering each, with nothing for classification, dimensionality reduction or feature extraction.

If there is a reason, how were those two algorithms implemented? If there isn't, what is the general consensus on adding new online machine learning algorithms?

Lucas Chagas

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to