Before hardware optimization there is always software optimization. Are you using dataset / dataframe? Are you using the right data types ( eg int where int is appropriate , try to avoid string and char etc) Do you extract only the stuff needed? What are the algorithm parameters?
> On 07 Jun 2016, at 13:09, Franc Carter <[email protected]> wrote: > > > Hi, > > I am training a RandomForest Regression Model on Spark-1.6.1 (EMR) and am > interested in how it might be best to scale it - e.g more cpus per instances, > more memory per instance, more instances etc. > > I'm currently using 32 m3.xlarge instances for for a training set with 2.5 > million rows, 1300 columns and a total size of 31GB (parquet) > > thanks > > -- > Franc --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
