Hi.

I have an RDD that I use repeatedly through many iterations of an
algorithm. To prevent recomputation, I persist the RDD (and incidentally I
also persist and checkpoint it's parents)


val consCostConstraintMap = consCost.join(constraintMap).map {
  case (cid, (costs,(mid1,_,mid2,_,_))) => {
    (cid, (costs, mid1, mid2))
  }
}
consCostConstraintMap.setName("consCostConstraintMap")
consCostConstraintMap.persist(MEMORY_AND_DISK_SER)

...

later on in an iterative loop

val update = updatedTrips.join(consCostConstraintMap).flatMap {
      ...
}.treeReduce()

---------

I can see from the UI that consCostConstraintMap is in storage
RDD NameStorage LevelCached PartitionsFraction CachedSize in MemorySize in
TachyonSize on Disk






consCostConstraintMap
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:4040/storage/rdd?id=113>Memory
Serialized 1x Replicated600100%15.2 GB0.0 B0.0 B
---------
In the Jobs list, I see the following pattern

Where each of the treeReduce line corresponds to one iteration loop

Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all
stages): Succeeded/Total





13treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job?id=13>2015/05/22
16:27:112.9 min16/16 (194 skipped)
9024/9024 (109225 skipped)
12treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job?id=12>2015/05/22
16:24:162.9 min16/16 (148 skipped)
9024/9024 (82725 skipped)
11treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job?id=11>2015/05/22
16:21:212.9 min16/16 (103 skipped)
9024/9024 (56280 skipped)
10treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job?id=10>2015/05/22
16:18:282.9 min16/16 (69 skipped)
9024/9024 (36980 skipped)






--------------
If I push into one Job I see
*Completed Stages:*
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job/?id=12#completed>
 16

   - *Skipped Stages:*
   
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/jobs/job/?id=12#skipped>
    148

Completed Stages (16)
Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
ReadShuffle Write525treeReduce at reconstruct.scala:243
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/stages/stage?id=525&attempt=0>
+details

2015/05/22 16:27:0942 ms
24/24
21.7 KB524.......






519map at reconstruct.scala:153
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/stages/stage?id=519&attempt=0>
+details

2015/05/22 16:24:161.2 min
600/600
14.8 GB8.4 GBThe last line map at reconstruct.scala:153
<http://ec2-54-151-185-196.ap-southeast-1.compute.amazonaws.com:8080/history/app-20150522160613-0001/stages/stage?id=519&attempt=0>
corresponds
to "val consCostConstraintMap = consCost.join(constraintMap).map {"
Which I expected to have been cached.
Is there some way I can find out what it is spending 1.2 mins doing .. I
presume reading and writing GB of data. But why? Eveything should be in
memory?

Any clues on where I should start?

tks

Reply via email to