Fwd: Train multiple machine learning models in parallel

Pola Yao Wed, 19 Dec 2018 16:03:53 -0800

Hi Comminuty,

I have a 1T dataset which contains records for  50 users. Each user has 20G
data averagely.


I wanted to use spark to train a machine learning model (e.g., XGBoost tree
model) for each user. Ideally, the result should be 50 models. However,
it'd be infeasible to submit 50 spark jobs through 'spark-submit'.

The model parameters and feature engineering steps for each user's data
would be exactly same, I am wondering if there is a way to train this 50
models in parallel?

Thanks!

Fwd: Train multiple machine learning models in parallel

Reply via email to