Could you paste some of your code for diagnosis?
Sincerely, DB Tsai ---------------------------------------------------------- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D <https://pgp.mit.edu/pks/lookup?search=0x59DF55B8AF08DF8D> On Wed, Sep 23, 2015 at 3:19 PM, Eugene Zhulenev <eugene.zhule...@gmail.com> wrote: > We are running Apache Spark 1.5.0 (latest code from 1.5 branch) > > We are running 2-3 LogisticRegression models in parallel (we'd love to run > 10-20 actually), they are not really big at all, maybe 1-2 million rows in > each model. > > Cluster itself, and all executors look good. Enough free memory and no > exceptions or errors. > > However I see very strange behavior inside Spark driver. Allocated heap > constantly growing. It grows up to 30 gigs in 1.5 hours and then everything > becomes super sloooooow. > > We don't do any collect, and I really don't understand who is consuming > all this memory. Looks like it's something inside LogisticRegression > itself, however I only see treeAggregate which should not require so much > memory to run. > > Any ideas? > > Plus I don't see any GC pause, looks like memory is still used by someone > inside driver. > > [image: Inline image 2] > [image: Inline image 1] >