We are running Apache Spark 1.5.0 (latest code from 1.5 branch)

We are running 2-3 LogisticRegression models in parallel (we'd love to run
10-20 actually), they are not really big at all, maybe 1-2 million rows in
each model.

Cluster itself, and all executors look good. Enough free memory and no
exceptions or errors.

However I see very strange behavior inside Spark driver. Allocated heap
constantly growing. It grows up to 30 gigs in 1.5 hours and then everything
becomes super sloooooow.

We don't do any collect, and I really don't understand who is consuming all
this memory. Looks like it's something inside LogisticRegression itself,
however I only see treeAggregate which should not require so much memory to
run.

Any ideas?

Plus I don't see any GC pause, looks like memory is still used by someone
inside driver.

[image: Inline image 2]
[image: Inline image 1]

Reply via email to