Hi, Yes, I believe people do that. I also believe that SparkML is able to figure out when to cache some internal RDD also. That's definitely true for random forest algo. It doesn't harm to cache the same RDD twice, too.
But it's not clear what'd you want to know... -- Be well! Jean Morozov On Sun, Apr 3, 2016 at 11:34 AM, Sergey <ser...@gmail.com> wrote: > Hi Spark ML experts! > > Do you use RDDs caching somewhere together with ML lib to speed up > calculation? > I mean typical machine learning use cases. > Train-test split, train, evaluate, apply model. > > Sergey. >