Hi.
I have an RDD that I use repeatedly through many iterations of an
algorithm. To prevent recomputation, I persist the RDD (and incidentally I
also persist and checkpoint it's parents)
val consCostConstraintMap = consCost.join(constraintMap).map {
case (cid, (costs,(mid1,_,mid2,_,_))) => {
(cid, (costs, mid1, mid2))
}
}
consCostConstraintMap.setName("consCostConstraintMap")
consCostConstraintMap.persist(MEMORY_AND_DISK_SER)
...
later on in an iterative loop
val update = updatedTrips.join(consCostConstraintMap).flatMap {
...
}.treeReduce()
---------
I can see from the UI that consCostConstraintMap is in storage
RDD NameStorage LevelCached PartitionsFraction CachedSize in MemorySize in
TachyonSize on Disk
consCostConstraintMap
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:4040/storage/rdd?id=113>Memory
Serialized 1x Replicated600100%15.2 GB0.0 B0.0 B
---------
In the Jobs list, I see the following pattern
Where each of the treeReduce line corresponds to one iteration loop
Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all
stages): Succeeded/Total
13treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job?id=13>2015/05/22
16:27:112.9 min16/16 (194 skipped)
9024/9024 (109225 skipped)
12treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job?id=12>2015/05/22
16:24:162.9 min16/16 (148 skipped)
9024/9024 (82725 skipped)
11treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job?id=11>2015/05/22
16:21:212.9 min16/16 (103 skipped)
9024/9024 (56280 skipped)
10treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job?id=10>2015/05/22
16:18:282.9 min16/16 (69 skipped)
9024/9024 (36980 skipped)
--------------
If I push into one Job I see
*Completed Stages:*
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job/?id=12#completed>
16
- *Skipped Stages:*
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job/?id=12#skipped>
148
Completed Stages (16)
Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
ReadShuffle Write525treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/stages/stage?id=525&attempt=0>
+details
2015/05/22 16:27:0942 ms
24/24
21.7 KB524.......
519map at reconstruct.scala:153
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/stages/stage?id=519&attempt=0>
+details
2015/05/22 16:24:161.2 min
600/600
14.8 GB8.4 GBThe last line map at reconstruct.scala:153
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/stages/stage?id=519&attempt=0>
corresponds
to "val consCostConstraintMap = consCost.join(constraintMap).map {"
Which I expected to have been cached.
Is there some way I can find out what it is spending 1.2 mins doing .. I
presume reading and writing GB of data. But why? Eveything should be in
memory?
Any clues on where I should start?
tks