Re: RDDs caching in typical machine learning use cases

2016-04-04 Thread Eugene Morozov
Hi, Yes, I believe people do that. I also believe that SparkML is able to figure out when to cache some internal RDD also. That's definitely true for random forest algo. It doesn't harm to cache the same RDD twice, too. But it's not clear what'd you want to know... -- Be well! Jean Morozov On S

RDDs caching in typical machine learning use cases

2016-04-03 Thread Sergey
Hi Spark ML experts! Do you use RDDs caching somewhere together with ML lib to speed up calculation? I mean typical machine learning use cases. Train-test split, train, evaluate, apply model. Sergey.