I am running a job where I am consistently causing the spark driver to crash,
and am unable to diagnose the cause. I am running on Databricks, but I am
posting my question here in case there may be something that I am doing
which is clearly a problematic operation in spark.

I am trying to do a machine learning-type task:
1) I simulate some data as an RDD of mllib LabeledPoints (8 partitions, 10k
points per partition, each has 40 dimensions).
2) I train a model on each partition, using a random forest library that I
wrote myself.  I use mapPartitions, and end up with an RDD of RandomForest,
one per partition
3) I try to force computation of the RDD of RandomForests by calling
count().  I also call checkpoint()
4) I generate a local test set on the master node of size 1000
5) I do batch fitting of test points. This procedure is a bit complicated,
but, for each batch (of size 100 points), each partition must calculate 100
different 40x40 matrices.  These are then communicated back to the master. I
do this by calling broadcast on the batch, and applying a map function to
the RDD of forests that were trained in step 3.  This creates a RDD of
Array[(Array(Double), Array(Double))], which I then collect(). It should
total only a few megabytes.

>From here things go wrong.  The first batch goes slowly, taking
approximately 1 minute.  Then the following batches are faster; about 10
seconds each.  Then the spark driver crashes after about the 4th or 5th
batch.
I have two questions:
1) Why might the first batch be particularly slow, given that I have already
forced computation of the RDD that it depends on, and there is no
difference, in principle between any of the batches?
2) What might be causing the spark driver to crash?

The code runs fine when I am running in local mode.

Thanks in advance, I can provide more details if necessary



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-determine-cause-of-spark-driver-crash-tp24917.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to