Hi Spark Community, Imagine you have a stable computing cluster (e.g. 5 nodes) with Hadoop that does not run anything that your Spark jobs.
Now imagine you run simple machine learning on the data (e.g. 100MB): 1. K-means - 5 min 2. Logistic regression - 5 min Now imagine that the volume of your data has doubled 2x to 200MB and it is still distributed around those available 5 nodes. Now, how much more time would this computation take now ? I presume more than 2x e.g. K-Means 25 min, and logistic regression 20 min? Just want to have an understanding how data growth would impact computational peformance for ML (any model in your experience is fine). Since my gut feeling if data increases 2x, the computation on the same cluster would increase > 2x. Thank you! Vasyl