The dataset was using a sparse representation before feeding into LogisticRegression.
On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu <weichen...@databricks.com> wrote: > Hi Qian, > > Do your dataset use sparse vector format ? > > > > On Mon, Apr 22, 2019 at 5:03 PM Qian He <hq.ja...@gmail.com> wrote: > >> Hi all, >> >> I'm using Spark provided LogisticRegression to fit a dataset. Each row of >> the data has 1.7 million columns, but it is sparse with only hundreds of >> 1s. The Spark Ui reported high GC time when the model is being trained. And >> my spark application got stuck without any response. I have allocated 100 >> executors and 8g for each executor. >> >> Is there any thing i should do to make the training process go >> successfully? >> >