Too little information to give an answer, if indeed an answer a priori is
possible.
However, I would do the following on your test instances:
- Run jstat -gc on all your nodes. It might be that the GC is taking a lot
of time.
- Poll with jstack semi frequently. I can give you a fairly good idea
Hi,
I ran a dataset of *200 columns and 0.2M records* in a cluster of *1 master
18 GB, 2 slaves 32 GB each, **16 cores/slave*, took around *772 minutes*
for a *very large ML tuning based job* (training).
Now, my requirement is to run the *same operation on 3M records*. Any idea
on how we should