Hi Comminuty,

I have a 1T dataset which contains records for  50 users. Each user has 20G
data averagely.

I wanted to use spark to train a machine learning model (e.g., XGBoost tree
model) for each user. Ideally, the result should be 50 models. However,
it'd be infeasible to submit 50 spark jobs through 'spark-submit'.

The model parameters and feature engineering steps for each user's data
would be exactly same, I am wondering if there is a way to train this 50
models in parallel?

Thanks!

Reply via email to