Too little information to give an answer, if indeed an answer a priori is possible.
However, I would do the following on your test instances: - Run jstat -gc on all your nodes. It might be that the GC is taking a lot of time. - Poll with jstack semi frequently. I can give you a fairly good idea where in the code the time is being spent in a non-invasive manner. Phillip On Mon, Feb 11, 2019 at 9:48 AM Aakash Basu <aakash.spark....@gmail.com> wrote: > Hi, > > I ran a dataset of *200 columns and 0.2M records* in a cluster of *1 > master 18 GB, 2 slaves 32 GB each, **16 cores/slave*, took around *772 > minutes* for a *very large ML tuning based job* (training). > > Now, my requirement is to run the *same operation on 3M records*. Any > idea on how we should proceed? Should we go for a vertical scaling or a > horizontal one? How should this problem be approached in a > stepwise/systematic manner? > > Thanks in advance. > > Regards, > Aakash. >