I haven't been able to see any evidence from the logs that there are RDDs being 
excluded. This is a test dataset, so quite small (<100k rows), so I'd be 
shocked if it was an OOM error. Where should I look in the UI to see whether 
RDDs are being excluded?

In case it helps....here's the full log info for the last boosting iteration. 
There's nothing obvious in there to me...

cheers
chris

---------

  total: 15.188428092
  findSplitsBins: 4.37387326
  findBestSplits: 5.815247303
  chooseSplits: 5.814782657
15/02/09 19:45:13 INFO spark.SparkContext: Starting job: take at 
DecisionTreeMetadata.scala:110
15/02/09 19:45:13 INFO scheduler.DAGScheduler: Got job 7993 (take at 
DecisionTreeMetadata.scala:110) with 1 output partitions (allowLocal=true)
15/02/09 19:45:13 INFO scheduler.DAGScheduler: Final stage: Stage 12988(take at 
DecisionTreeMetadata.scala:110)
15/02/09 19:45:13 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/02/09 19:45:13 INFO scheduler.DAGScheduler: Missing parents: List()
15/02/09 19:45:13 INFO scheduler.DAGScheduler: Submitting Stage 12988 
(MapPartitionsRDD[19985] at retag at RandomForest.scala:136), which has no 
missing parents
15/02/09 19:45:13 INFO storage.MemoryStore: ensureFreeSpace(6239456) called 
with curMem=34841072600, maxMem=55566516879
15/02/09 19:45:13 INFO storage.MemoryStore: Block broadcast_17984 stored as 
values in memory (estimated size 6.0 MB, free 19.3 GB)
15/02/09 19:45:13 INFO storage.MemoryStore: ensureFreeSpace(2476087) called 
with curMem=34847312056, maxMem=55566516879
15/02/09 19:45:13 INFO storage.MemoryStore: Block broadcast_17984_piece0 stored 
as bytes in memory (estimated size 2.4 MB, free 19.3 GB)
15/02/09 19:45:13 INFO storage.BlockManagerInfo: Added broadcast_17984_piece0 
in memory on hadoop-009:52371 (size: 2.4 MB, free: 42.5 GB)
15/02/09 19:45:13 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17984_piece0
15/02/09 19:45:13 INFO spark.SparkContext: Created broadcast 17984 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:13 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from 
Stage 12988 (MapPartitionsRDD[19985] at retag at RandomForest.scala:136)
15/02/09 19:45:13 INFO scheduler.TaskSchedulerImpl: Adding task set 12988.0 
with 1 tasks
15/02/09 19:45:13 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12988.0 (TID 24976, hadoop-011, PROCESS_LOCAL, 1371 bytes)
15/02/09 19:45:13 INFO storage.BlockManagerInfo: Added broadcast_17984_piece0 
in memory on hadoop-011:46034 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:14 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12988.0 (TID 24976) in 150 ms on hadoop-011 (1/1)
15/02/09 19:45:14 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12988.0, 
whose tasks have all completed, from pool
15/02/09 19:45:14 INFO scheduler.DAGScheduler: Stage 12988 (take at 
DecisionTreeMetadata.scala:110) finished in 0.150 s
15/02/09 19:45:14 INFO scheduler.DAGScheduler: Job 7993 finished: take at 
DecisionTreeMetadata.scala:110, took 0.292881 s
15/02/09 19:45:14 INFO spark.SparkContext: Starting job: count at 
DecisionTreeMetadata.scala:111
15/02/09 19:45:14 INFO scheduler.DAGScheduler: Got job 7994 (count at 
DecisionTreeMetadata.scala:111) with 2 output partitions (allowLocal=false)
15/02/09 19:45:14 INFO scheduler.DAGScheduler: Final stage: Stage 12989(count 
at DecisionTreeMetadata.scala:111)
15/02/09 19:45:14 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/02/09 19:45:14 INFO scheduler.DAGScheduler: Missing parents: List()
15/02/09 19:45:14 INFO scheduler.DAGScheduler: Submitting Stage 12989 
(MapPartitionsRDD[19985] at retag at RandomForest.scala:136), which has no 
missing parents
15/02/09 19:45:14 INFO storage.MemoryStore: ensureFreeSpace(6239424) called 
with curMem=34849788143, maxMem=55566516879
15/02/09 19:45:14 INFO storage.MemoryStore: Block broadcast_17985 stored as 
values in memory (estimated size 6.0 MB, free 19.3 GB)
15/02/09 19:45:14 INFO storage.MemoryStore: ensureFreeSpace(2476058) called 
with curMem=34856027567, maxMem=55566516879
15/02/09 19:45:14 INFO storage.MemoryStore: Block broadcast_17985_piece0 stored 
as bytes in memory (estimated size 2.4 MB, free 19.3 GB)
15/02/09 19:45:14 INFO storage.BlockManagerInfo: Added broadcast_17985_piece0 
in memory on hadoop-009:52371 (size: 2.4 MB, free: 42.5 GB)
15/02/09 19:45:14 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17985_piece0
15/02/09 19:45:14 INFO spark.SparkContext: Created broadcast 17985 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:14 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12989 (MapPartitionsRDD[19985] at retag at RandomForest.scala:136)
15/02/09 19:45:14 INFO scheduler.TaskSchedulerImpl: Adding task set 12989.0 
with 2 tasks
15/02/09 19:45:14 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12989.0 (TID 24977, hadoop-010, PROCESS_LOCAL, 1371 bytes)
15/02/09 19:45:14 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12989.0 (TID 24978, hadoop-011, PROCESS_LOCAL, 1371 bytes)
15/02/09 19:45:14 INFO storage.BlockManagerInfo: Added broadcast_17985_piece0 
in memory on hadoop-011:46034 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:14 INFO storage.BlockManagerInfo: Added broadcast_17985_piece0 
in memory on hadoop-010:51816 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:18 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12989.0 (TID 24978) in 4401 ms on hadoop-011 (1/2)
15/02/09 19:45:18 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12989.0 (TID 24977) in 4801 ms on hadoop-010 (2/2)
15/02/09 19:45:18 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12989.0, 
whose tasks have all completed, from pool
15/02/09 19:45:18 INFO scheduler.DAGScheduler: Stage 12989 (count at 
DecisionTreeMetadata.scala:111) finished in 4.801 s
15/02/09 19:45:18 INFO scheduler.DAGScheduler: Job 7994 finished: count at 
DecisionTreeMetadata.scala:111, took 4.946490 s
15/02/09 19:45:19 INFO spark.SparkContext: Starting job: collect at 
DecisionTree.scala:981
15/02/09 19:45:19 INFO scheduler.DAGScheduler: Got job 7995 (collect at 
DecisionTree.scala:981) with 2 output partitions (allowLocal=false)
15/02/09 19:45:19 INFO scheduler.DAGScheduler: Final stage: Stage 12990(collect 
at DecisionTree.scala:981)
15/02/09 19:45:19 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/02/09 19:45:19 INFO scheduler.DAGScheduler: Missing parents: List()
15/02/09 19:45:19 INFO scheduler.DAGScheduler: Submitting Stage 12990 
(PartitionwiseSampledRDD[19986] at sample at DecisionTree.scala:981), which has 
no missing parents
15/02/09 19:45:19 INFO storage.MemoryStore: ensureFreeSpace(6240024) called 
with curMem=34858503625, maxMem=55566516879
15/02/09 19:45:19 INFO storage.MemoryStore: Block broadcast_17986 stored as 
values in memory (estimated size 6.0 MB, free 19.3 GB)
15/02/09 19:45:19 INFO storage.MemoryStore: ensureFreeSpace(2478110) called 
with curMem=34864743649, maxMem=55566516879
15/02/09 19:45:19 INFO storage.MemoryStore: Block broadcast_17986_piece0 stored 
as bytes in memory (estimated size 2.4 MB, free 19.3 GB)
15/02/09 19:45:19 INFO storage.BlockManagerInfo: Added broadcast_17986_piece0 
in memory on hadoop-009:52371 (size: 2.4 MB, free: 42.5 GB)
15/02/09 19:45:19 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17986_piece0
15/02/09 19:45:19 INFO spark.SparkContext: Created broadcast 17986 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:19 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12990 (PartitionwiseSampledRDD[19986] at sample at DecisionTree.scala:981)
15/02/09 19:45:19 INFO scheduler.TaskSchedulerImpl: Adding task set 12990.0 
with 2 tasks
15/02/09 19:45:19 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12990.0 (TID 24979, hadoop-011, PROCESS_LOCAL, 1480 bytes)
15/02/09 19:45:19 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12990.0 (TID 24980, hadoop-010, PROCESS_LOCAL, 1480 bytes)
15/02/09 19:45:19 INFO storage.BlockManagerInfo: Added broadcast_17986_piece0 
in memory on hadoop-010:51816 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:19 INFO storage.BlockManagerInfo: Added broadcast_17986_piece0 
in memory on hadoop-011:46034 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:23 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12990.0 (TID 24979) in 4203 ms on hadoop-011 (1/2)
15/02/09 19:45:23 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12990.0 (TID 24980) in 4282 ms on hadoop-010 (2/2)
15/02/09 19:45:23 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12990.0, 
whose tasks have all completed, from pool
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Stage 12990 (collect at 
DecisionTree.scala:981) finished in 4.291 s
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Job 7995 finished: collect at 
DecisionTree.scala:981, took 4.428949 s
15/02/09 19:45:23 INFO storage.MemoryStore: ensureFreeSpace(48) called with 
curMem=34867221759, maxMem=55566516879
15/02/09 19:45:23 INFO storage.MemoryStore: Block broadcast_17987 stored as 
values in memory (estimated size 48.0 B, free 19.3 GB)
15/02/09 19:45:23 INFO storage.MemoryStore: ensureFreeSpace(81) called with 
curMem=34867221807, maxMem=55566516879
15/02/09 19:45:23 INFO storage.MemoryStore: Block broadcast_17987_piece0 stored 
as bytes in memory (estimated size 81.0 B, free 19.3 GB)
15/02/09 19:45:23 INFO storage.BlockManagerInfo: Added broadcast_17987_piece0 
in memory on hadoop-009:52371 (size: 81.0 B, free: 42.5 GB)
15/02/09 19:45:23 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17987_piece0
15/02/09 19:45:23 INFO spark.SparkContext: Created broadcast 17987 from 
broadcast at DecisionTree.scala:596
15/02/09 19:45:23 INFO spark.SparkContext: Starting job: collectAsMap at 
DecisionTree.scala:646
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Registering RDD 19989 
(mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Got job 7996 (collectAsMap at 
DecisionTree.scala:646) with 2 output partitions (allowLocal=false)
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Final stage: Stage 
12992(collectAsMap at DecisionTree.scala:646)
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Parents of final stage: 
List(Stage 12991)
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Missing parents: List(Stage 
12991)
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Submitting Stage 12991 
(MapPartitionsRDD[19989] at mapPartitions at DecisionTree.scala:617), which has 
no missing parents
15/02/09 19:45:23 INFO storage.MemoryStore: ensureFreeSpace(6249664) called 
with curMem=34867221888, maxMem=55566516879
15/02/09 19:45:23 INFO storage.MemoryStore: Block broadcast_17988 stored as 
values in memory (estimated size 6.0 MB, free 19.3 GB)
15/02/09 19:45:23 INFO storage.MemoryStore: ensureFreeSpace(2478678) called 
with curMem=34873471552, maxMem=55566516879
15/02/09 19:45:23 INFO storage.MemoryStore: Block broadcast_17988_piece0 stored 
as bytes in memory (estimated size 2.4 MB, free 19.3 GB)
15/02/09 19:45:23 INFO storage.BlockManagerInfo: Added broadcast_17988_piece0 
in memory on hadoop-009:52371 (size: 2.4 MB, free: 42.5 GB)
15/02/09 19:45:23 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17988_piece0
15/02/09 19:45:23 INFO spark.SparkContext: Created broadcast 17988 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:23 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12991 (MapPartitionsRDD[19989] at mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:23 INFO scheduler.TaskSchedulerImpl: Adding task set 12991.0 
with 2 tasks
15/02/09 19:45:23 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12991.0 (TID 24981, hadoop-011, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:23 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12991.0 (TID 24982, hadoop-010, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:23 INFO storage.BlockManagerInfo: Added broadcast_17988_piece0 
in memory on hadoop-011:46034 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:23 INFO storage.BlockManagerInfo: Added broadcast_17988_piece0 
in memory on hadoop-010:51816 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:27 INFO storage.BlockManagerInfo: Added rdd_19988_0 in memory on 
hadoop-011:46034 (size: 2.0 MB, free: 7.6 GB)
15/02/09 19:45:27 INFO storage.BlockManagerInfo: Added broadcast_17987_piece0 
in memory on hadoop-011:46034 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:27 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12991.0 (TID 24981) in 3448 ms on hadoop-011 (1/2)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added rdd_19988_1 in memory on 
hadoop-010:51816 (size: 2.0 MB, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17987_piece0 
in memory on hadoop-010:51816 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12991.0 (TID 24982) in 4384 ms on hadoop-010 (2/2)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12991.0, 
whose tasks have all completed, from pool
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Stage 12991 (mapPartitions at 
DecisionTree.scala:617) finished in 4.385 s
15/02/09 19:45:28 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/02/09 19:45:28 INFO scheduler.DAGScheduler: running: Set()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: waiting: Set(Stage 12992)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: failed: Set()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage 12992: 
List()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting Stage 12992 
(MapPartitionsRDD[19991] at map at DecisionTree.scala:637), which is now 
runnable
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(8280) called with 
curMem=34875950230, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17989 stored as 
values in memory (estimated size 8.1 KB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(3526) called with 
curMem=34875958510, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17989_piece0 stored 
as bytes in memory (estimated size 3.4 KB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17989_piece0 
in memory on hadoop-009:52371 (size: 3.4 KB, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17989_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17989 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12992 (MapPartitionsRDD[19991] at map at DecisionTree.scala:637)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Adding task set 12992.0 
with 2 tasks
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12992.0 (TID 24983, hadoop-013, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12992.0 (TID 24984, hadoop-010, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17989_piece0 
in memory on hadoop-010:51816 (size: 3.4 KB, free: 7.6 GB)
15/02/09 19:45:28 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4995 to sparkExecutor@hadoop-010:45084
15/02/09 19:45:28 INFO spark.MapOutputTrackerMaster: Size of output statuses 
for shuffle 4995 is 158 bytes
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17989_piece0 
in memory on hadoop-013:50803 (size: 3.4 KB, free: 10.3 GB)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12992.0 (TID 24984) in 9 ms on hadoop-010 (1/2)
15/02/09 19:45:28 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4995 to sparkExecutor@hadoop-013:45260
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17987_piece0 
in memory on hadoop-013:50803 (size: 81.0 B, free: 10.3 GB)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12992.0 (TID 24983) in 24 ms on hadoop-013 (2/2)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12992.0, 
whose tasks have all completed, from pool
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Stage 12992 (collectAsMap at 
DecisionTree.scala:646) finished in 0.025 s
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Job 7996 finished: collectAsMap 
at DecisionTree.scala:646, took 4.558422 s
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(48) called with 
curMem=34875962036, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17990 stored as 
values in memory (estimated size 48.0 B, free 19.3 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(81) called with 
curMem=34875962084, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17990_piece0 stored 
as bytes in memory (estimated size 81.0 B, free 19.3 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17990_piece0 
in memory on hadoop-009:52371 (size: 81.0 B, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17990_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17990 from 
broadcast at DecisionTree.scala:596
15/02/09 19:45:28 INFO spark.SparkContext: Starting job: collectAsMap at 
DecisionTree.scala:646
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Registering RDD 19992 
(mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Got job 7997 (collectAsMap at 
DecisionTree.scala:646) with 2 output partitions (allowLocal=false)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Final stage: Stage 
12994(collectAsMap at DecisionTree.scala:646)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Parents of final stage: 
List(Stage 12993)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Missing parents: List(Stage 
12993)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting Stage 12993 
(MapPartitionsRDD[19992] at mapPartitions at DecisionTree.scala:617), which has 
no missing parents
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(6249992) called 
with curMem=34875962165, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17991 stored as 
values in memory (estimated size 6.0 MB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(2478823) called 
with curMem=34882212157, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17991_piece0 stored 
as bytes in memory (estimated size 2.4 MB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17991_piece0 
in memory on hadoop-009:52371 (size: 2.4 MB, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17991_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17991 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12993 (MapPartitionsRDD[19992] at mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Adding task set 12993.0 
with 2 tasks
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12993.0 (TID 24985, hadoop-010, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12993.0 (TID 24986, hadoop-011, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17991_piece0 
in memory on hadoop-011:46034 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17991_piece0 
in memory on hadoop-010:51816 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17990_piece0 
in memory on hadoop-010:51816 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17990_piece0 
in memory on hadoop-011:46034 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12993.0 (TID 24985) in 159 ms on hadoop-010 (1/2)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12993.0 (TID 24986) in 159 ms on hadoop-011 (2/2)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12993.0, 
whose tasks have all completed, from pool
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Stage 12993 (mapPartitions at 
DecisionTree.scala:617) finished in 0.161 s
15/02/09 19:45:28 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/02/09 19:45:28 INFO scheduler.DAGScheduler: running: Set()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: waiting: Set(Stage 12994)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: failed: Set()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage 12994: 
List()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting Stage 12994 
(MapPartitionsRDD[19994] at map at DecisionTree.scala:637), which is now 
runnable
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(8344) called with 
curMem=34884690980, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17992 stored as 
values in memory (estimated size 8.1 KB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(3566) called with 
curMem=34884699324, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17992_piece0 stored 
as bytes in memory (estimated size 3.5 KB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17992_piece0 
in memory on hadoop-009:52371 (size: 3.5 KB, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17992_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17992 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12994 (MapPartitionsRDD[19994] at map at DecisionTree.scala:637)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Adding task set 12994.0 
with 2 tasks
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12994.0 (TID 24987, hadoop-012, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12994.0 (TID 24988, hadoop-010, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17992_piece0 
in memory on hadoop-010:51816 (size: 3.5 KB, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17992_piece0 
in memory on hadoop-012:43314 (size: 3.5 KB, free: 10.3 GB)
15/02/09 19:45:28 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4996 to sparkExecutor@hadoop-010:45084
15/02/09 19:45:28 INFO spark.MapOutputTrackerMaster: Size of output statuses 
for shuffle 4996 is 158 bytes
15/02/09 19:45:28 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4996 to sparkExecutor@hadoop-012:54654
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12994.0 (TID 24988) in 11 ms on hadoop-010 (1/2)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17990_piece0 
in memory on hadoop-012:43314 (size: 81.0 B, free: 10.3 GB)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12994.0 (TID 24987) in 18 ms on hadoop-012 (2/2)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12994.0, 
whose tasks have all completed, from pool
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Stage 12994 (collectAsMap at 
DecisionTree.scala:646) finished in 0.018 s
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Job 7997 finished: collectAsMap 
at DecisionTree.scala:646, took 0.324445 s
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(48) called with 
curMem=34884702890, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17993 stored as 
values in memory (estimated size 48.0 B, free 19.3 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(81) called with 
curMem=34884702938, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17993_piece0 stored 
as bytes in memory (estimated size 81.0 B, free 19.3 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17993_piece0 
in memory on hadoop-009:52371 (size: 81.0 B, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17993_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17993 from 
broadcast at DecisionTree.scala:596
15/02/09 19:45:28 INFO spark.SparkContext: Starting job: collectAsMap at 
DecisionTree.scala:646
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Registering RDD 19995 
(mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Got job 7998 (collectAsMap at 
DecisionTree.scala:646) with 2 output partitions (allowLocal=false)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Final stage: Stage 
12996(collectAsMap at DecisionTree.scala:646)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Parents of final stage: 
List(Stage 12995)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Missing parents: List(Stage 
12995)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting Stage 12995 
(MapPartitionsRDD[19995] at mapPartitions at DecisionTree.scala:617), which has 
no missing parents
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(6250536) called 
with curMem=34884703019, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17994 stored as 
values in memory (estimated size 6.0 MB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(2479057) called 
with curMem=34890953555, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17994_piece0 stored 
as bytes in memory (estimated size 2.4 MB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17994_piece0 
in memory on hadoop-009:52371 (size: 2.4 MB, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17994_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17994 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12995 (MapPartitionsRDD[19995] at mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Adding task set 12995.0 
with 2 tasks
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12995.0 (TID 24989, hadoop-011, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12995.0 (TID 24990, hadoop-010, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17994_piece0 
in memory on hadoop-010:51816 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17994_piece0 
in memory on hadoop-011:46034 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17993_piece0 
in memory on hadoop-011:46034 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17993_piece0 
in memory on hadoop-010:51816 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12995.0 (TID 24989) in 160 ms on hadoop-011 (1/2)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12995.0 (TID 24990) in 163 ms on hadoop-010 (2/2)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12995.0, 
whose tasks have all completed, from pool
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Stage 12995 (mapPartitions at 
DecisionTree.scala:617) finished in 0.163 s
15/02/09 19:45:28 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/02/09 19:45:28 INFO scheduler.DAGScheduler: running: Set()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: waiting: Set(Stage 12996)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: failed: Set()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage 12996: 
List()
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting Stage 12996 
(MapPartitionsRDD[19997] at map at DecisionTree.scala:637), which is now 
runnable
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(8464) called with 
curMem=34893432612, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17995 stored as 
values in memory (estimated size 8.3 KB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(3615) called with 
curMem=34893441076, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17995_piece0 stored 
as bytes in memory (estimated size 3.5 KB, free 19.3 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17995_piece0 
in memory on hadoop-009:52371 (size: 3.5 KB, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17995_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17995 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12996 (MapPartitionsRDD[19997] at map at DecisionTree.scala:637)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Adding task set 12996.0 
with 2 tasks
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12996.0 (TID 24991, hadoop-013, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12996.0 (TID 24992, hadoop-010, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17995_piece0 
in memory on hadoop-010:51816 (size: 3.5 KB, free: 7.6 GB)
15/02/09 19:45:28 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4997 to sparkExecutor@hadoop-010:45084
15/02/09 19:45:28 INFO spark.MapOutputTrackerMaster: Size of output statuses 
for shuffle 4997 is 158 bytes
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17995_piece0 
in memory on hadoop-013:50803 (size: 3.5 KB, free: 10.3 GB)
15/02/09 19:45:28 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4997 to sparkExecutor@hadoop-013:45260
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12996.0 (TID 24992) in 10 ms on hadoop-010 (1/2)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17993_piece0 
in memory on hadoop-013:50803 (size: 81.0 B, free: 10.3 GB)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12996.0 (TID 24991) in 23 ms on hadoop-013 (2/2)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Stage 12996 (collectAsMap at 
DecisionTree.scala:646) finished in 0.023 s
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12996.0, 
whose tasks have all completed, from pool
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Job 7998 finished: collectAsMap 
at DecisionTree.scala:646, took 0.336750 s
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(48) called with 
curMem=34893444691, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17996 stored as 
values in memory (estimated size 48.0 B, free 19.3 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(81) called with 
curMem=34893444739, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17996_piece0 stored 
as bytes in memory (estimated size 81.0 B, free 19.3 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17996_piece0 
in memory on hadoop-009:52371 (size: 81.0 B, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17996_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17996 from 
broadcast at DecisionTree.scala:596
15/02/09 19:45:28 INFO spark.SparkContext: Starting job: collectAsMap at 
DecisionTree.scala:646
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Registering RDD 19998 
(mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Got job 7999 (collectAsMap at 
DecisionTree.scala:646) with 2 output partitions (allowLocal=false)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Final stage: Stage 
12998(collectAsMap at DecisionTree.scala:646)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Parents of final stage: 
List(Stage 12997)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Missing parents: List(Stage 
12997)
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting Stage 12997 
(MapPartitionsRDD[19998] at mapPartitions at DecisionTree.scala:617), which has 
no missing parents
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(6251432) called 
with curMem=34893444820, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17997 stored as 
values in memory (estimated size 6.0 MB, free 19.2 GB)
15/02/09 19:45:28 INFO storage.MemoryStore: ensureFreeSpace(2479461) called 
with curMem=34899696252, maxMem=55566516879
15/02/09 19:45:28 INFO storage.MemoryStore: Block broadcast_17997_piece0 stored 
as bytes in memory (estimated size 2.4 MB, free 19.2 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17997_piece0 
in memory on hadoop-009:52371 (size: 2.4 MB, free: 42.5 GB)
15/02/09 19:45:28 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17997_piece0
15/02/09 19:45:28 INFO spark.SparkContext: Created broadcast 17997 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12997 (MapPartitionsRDD[19998] at mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:28 INFO scheduler.TaskSchedulerImpl: Adding task set 12997.0 
with 2 tasks
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12997.0 (TID 24993, hadoop-011, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12997.0 (TID 24994, hadoop-010, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17997_piece0 
in memory on hadoop-010:51816 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:28 INFO storage.BlockManagerInfo: Added broadcast_17997_piece0 
in memory on hadoop-011:46034 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17996_piece0 
in memory on hadoop-010:51816 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17996_piece0 
in memory on hadoop-011:46034 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12997.0 (TID 24993) in 158 ms on hadoop-011 (1/2)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12997.0 (TID 24994) in 158 ms on hadoop-010 (2/2)
15/02/09 19:45:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12997.0, 
whose tasks have all completed, from pool
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Stage 12997 (mapPartitions at 
DecisionTree.scala:617) finished in 0.160 s
15/02/09 19:45:29 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/02/09 19:45:29 INFO scheduler.DAGScheduler: running: Set()
15/02/09 19:45:29 INFO scheduler.DAGScheduler: waiting: Set(Stage 12998)
15/02/09 19:45:29 INFO scheduler.DAGScheduler: failed: Set()
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Missing parents for Stage 12998: 
List()
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Submitting Stage 12998 
(MapPartitionsRDD[20000] at map at DecisionTree.scala:637), which is now 
runnable
15/02/09 19:45:29 INFO storage.MemoryStore: ensureFreeSpace(8712) called with 
curMem=34902175713, maxMem=55566516879
15/02/09 19:45:29 INFO storage.MemoryStore: Block broadcast_17998 stored as 
values in memory (estimated size 8.5 KB, free 19.2 GB)
15/02/09 19:45:29 INFO storage.MemoryStore: ensureFreeSpace(3717) called with 
curMem=34902184425, maxMem=55566516879
15/02/09 19:45:29 INFO storage.MemoryStore: Block broadcast_17998_piece0 stored 
as bytes in memory (estimated size 3.6 KB, free 19.2 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17998_piece0 
in memory on hadoop-009:52371 (size: 3.6 KB, free: 42.5 GB)
15/02/09 19:45:29 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17998_piece0
15/02/09 19:45:29 INFO spark.SparkContext: Created broadcast 17998 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12998 (MapPartitionsRDD[20000] at map at DecisionTree.scala:637)
15/02/09 19:45:29 INFO scheduler.TaskSchedulerImpl: Adding task set 12998.0 
with 2 tasks
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12998.0 (TID 24995, hadoop-010, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12998.0 (TID 24996, hadoop-011, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17998_piece0 
in memory on hadoop-010:51816 (size: 3.6 KB, free: 7.6 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17998_piece0 
in memory on hadoop-011:46034 (size: 3.6 KB, free: 7.6 GB)
15/02/09 19:45:29 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4998 to sparkExecutor@hadoop-011:52133
15/02/09 19:45:29 INFO spark.MapOutputTrackerMaster: Size of output statuses 
for shuffle 4998 is 158 bytes
15/02/09 19:45:29 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4998 to sparkExecutor@hadoop-010:45084
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12998.0 (TID 24996) in 10 ms on hadoop-011 (1/2)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12998.0 (TID 24995) in 12 ms on hadoop-010 (2/2)
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Stage 12998 (collectAsMap at 
DecisionTree.scala:646) finished in 0.012 s
15/02/09 19:45:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12998.0, 
whose tasks have all completed, from pool
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Job 7999 finished: collectAsMap 
at DecisionTree.scala:646, took 0.318375 s
15/02/09 19:45:29 INFO storage.MemoryStore: ensureFreeSpace(48) called with 
curMem=34902188142, maxMem=55566516879
15/02/09 19:45:29 INFO storage.MemoryStore: Block broadcast_17999 stored as 
values in memory (estimated size 48.0 B, free 19.2 GB)
15/02/09 19:45:29 INFO storage.MemoryStore: ensureFreeSpace(81) called with 
curMem=34902188190, maxMem=55566516879
15/02/09 19:45:29 INFO storage.MemoryStore: Block broadcast_17999_piece0 stored 
as bytes in memory (estimated size 81.0 B, free 19.2 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17999_piece0 
in memory on hadoop-009:52371 (size: 81.0 B, free: 42.5 GB)
15/02/09 19:45:29 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_17999_piece0
15/02/09 19:45:29 INFO spark.SparkContext: Created broadcast 17999 from 
broadcast at DecisionTree.scala:596
15/02/09 19:45:29 INFO spark.SparkContext: Starting job: collectAsMap at 
DecisionTree.scala:646
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Registering RDD 20001 
(mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Got job 8000 (collectAsMap at 
DecisionTree.scala:646) with 2 output partitions (allowLocal=false)
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Final stage: Stage 
13000(collectAsMap at DecisionTree.scala:646)
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Parents of final stage: 
List(Stage 12999)
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Missing parents: List(Stage 
12999)
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Submitting Stage 12999 
(MapPartitionsRDD[20001] at mapPartitions at DecisionTree.scala:617), which has 
no missing parents
15/02/09 19:45:29 INFO storage.MemoryStore: ensureFreeSpace(6253384) called 
with curMem=34902188271, maxMem=55566516879
15/02/09 19:45:29 INFO storage.MemoryStore: Block broadcast_18000 stored as 
values in memory (estimated size 6.0 MB, free 19.2 GB)
15/02/09 19:45:29 INFO storage.MemoryStore: ensureFreeSpace(2480242) called 
with curMem=34908441655, maxMem=55566516879
15/02/09 19:45:29 INFO storage.MemoryStore: Block broadcast_18000_piece0 stored 
as bytes in memory (estimated size 2.4 MB, free 19.2 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_18000_piece0 
in memory on hadoop-009:52371 (size: 2.4 MB, free: 42.5 GB)
15/02/09 19:45:29 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_18000_piece0
15/02/09 19:45:29 INFO spark.SparkContext: Created broadcast 18000 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 12999 (MapPartitionsRDD[20001] at mapPartitions at DecisionTree.scala:617)
15/02/09 19:45:29 INFO scheduler.TaskSchedulerImpl: Adding task set 12999.0 
with 2 tasks
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
12999.0 (TID 24997, hadoop-011, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
12999.0 (TID 24998, hadoop-010, PROCESS_LOCAL, 1360 bytes)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_18000_piece0 
in memory on hadoop-010:51816 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_18000_piece0 
in memory on hadoop-011:46034 (size: 2.4 MB, free: 7.6 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17999_piece0 
in memory on hadoop-011:46034 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17999_piece0 
in memory on hadoop-010:51816 (size: 81.0 B, free: 7.6 GB)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
12999.0 (TID 24997) in 158 ms on hadoop-011 (1/2)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
12999.0 (TID 24998) in 162 ms on hadoop-010 (2/2)
15/02/09 19:45:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12999.0, 
whose tasks have all completed, from pool
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Stage 12999 (mapPartitions at 
DecisionTree.scala:617) finished in 0.164 s
15/02/09 19:45:29 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/02/09 19:45:29 INFO scheduler.DAGScheduler: running: Set()
15/02/09 19:45:29 INFO scheduler.DAGScheduler: waiting: Set(Stage 13000)
15/02/09 19:45:29 INFO scheduler.DAGScheduler: failed: Set()
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Missing parents for Stage 13000: 
List()
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Submitting Stage 13000 
(MapPartitionsRDD[20003] at map at DecisionTree.scala:637), which is now 
runnable
15/02/09 19:45:29 INFO storage.MemoryStore: ensureFreeSpace(9200) called with 
curMem=34910921897, maxMem=55566516879
15/02/09 19:45:29 INFO storage.MemoryStore: Block broadcast_18001 stored as 
values in memory (estimated size 9.0 KB, free 19.2 GB)
15/02/09 19:45:29 INFO storage.MemoryStore: ensureFreeSpace(3926) called with 
curMem=34910931097, maxMem=55566516879
15/02/09 19:45:29 INFO storage.MemoryStore: Block broadcast_18001_piece0 stored 
as bytes in memory (estimated size 3.8 KB, free 19.2 GB)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_18001_piece0 
in memory on hadoop-009:52371 (size: 3.8 KB, free: 42.5 GB)
15/02/09 19:45:29 INFO storage.BlockManagerMaster: Updated info of block 
broadcast_18001_piece0
15/02/09 19:45:29 INFO spark.SparkContext: Created broadcast 18001 from 
broadcast at DAGScheduler.scala:829
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from 
Stage 13000 (MapPartitionsRDD[20003] at map at DecisionTree.scala:637)
15/02/09 19:45:29 INFO scheduler.TaskSchedulerImpl: Adding task set 13000.0 
with 2 tasks
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
13000.0 (TID 24999, hadoop-013, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 
13000.0 (TID 25000, hadoop-011, PROCESS_LOCAL, 1117 bytes)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_18001_piece0 
in memory on hadoop-011:46034 (size: 3.8 KB, free: 7.6 GB)
15/02/09 19:45:29 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4999 to sparkExecutor@hadoop-011:52133
15/02/09 19:45:29 INFO spark.MapOutputTrackerMaster: Size of output statuses 
for shuffle 4999 is 160 bytes
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_18001_piece0 
in memory on hadoop-013:50803 (size: 3.8 KB, free: 10.3 GB)
15/02/09 19:45:29 INFO spark.MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 4999 to sparkExecutor@hadoop-013:45260
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
13000.0 (TID 25000) in 11 ms on hadoop-011 (1/2)
15/02/09 19:45:29 INFO storage.BlockManagerInfo: Added broadcast_17999_piece0 
in memory on hadoop-013:50803 (size: 81.0 B, free: 10.3 GB)
15/02/09 19:45:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
13000.0 (TID 24999) in 26 ms on hadoop-013 (2/2)
15/02/09 19:45:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 13000.0, 
whose tasks have all completed, from pool
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Stage 13000 (collectAsMap at 
DecisionTree.scala:646) finished in 0.026 s
15/02/09 19:45:29 INFO scheduler.DAGScheduler: Job 8000 finished: collectAsMap 
at DecisionTree.scala:646, took 0.342683 s
15/02/09 19:45:29 INFO rdd.MapPartitionsRDD: Removing RDD 19988 from 
persistence list
15/02/09 19:45:29 INFO storage.BlockManager: Removing RDD 19988
15/02/09 19:45:29 INFO tree.RandomForest: Internal timing for DecisionTree:
15/02/09 19:45:29 INFO tree.RandomForest:   init: 9.903233409
  total: 15.855226062
  findSplitsBins: 4.557418734
  findBestSplits: 5.928304151
  chooseSplits: 5.927796717
15/02/09 19:45:29 INFO tree.GradientBoostedTrees: Internal timing for 
DecisionTree:
15/02/09 19:45:29 INFO tree.GradientBoostedTrees:   building tree 584: 
9.53796807
  building tree 303: 5.870926773
  building tree 293: 5.379115341
  building tree 599: 9.263506141
  building tree 479: 7.648729795

-----Original Message-----
From: Xiangrui Meng [mailto:men...@gmail.com]
Sent: Tuesday, 10 February 2015 7:07 AM
To: Christopher Thom
Cc: user@spark.apache.org
Subject: Re: [MLlib] Performance issues when building GBM models

Could you check the Spark UI and see whether there are RDDs being kicked out 
during the computation? We cache the residual RDD after each iteration. If we 
don't have enough memory/disk, it gets recomputed and results something like 
`t(n) = t(n-1) + const`. We might cache the features multiple times, which 
could be improved.
-Xiangrui

On Sun, Feb 8, 2015 at 5:32 PM, Christopher Thom 
<christopher.t...@quantium.com.au> wrote:
> Hi All,
>
> I wonder if anyone else has some experience building a Gradient Boosted Trees 
> model using spark/mllib? I have noticed when building decent-size models that 
> the process slows down over time. We observe that the time to build tree n is 
> approximately a constant time longer than the time to build tree n-1 i.e. 
> t(n) = t(n-1) + const. The implication is that the total build time goes as 
> something like N^2, where N is the total number of trees. I would expect that 
> the algorithm should be approximately linear in total time (i.e. each 
> boosting iteration takes roughly the same time to complete).
>
> So I have a couple of questions:
> 1. Is this behaviour expected, or consistent with what others are seeing?
> 2. Does anyone know if there a tuning parameters (e.g. in the boosting 
> strategy, or tree stategy) that may be impacting this?
>
> All aspects of the build seem to slow down as I go. Here's a random example 
> culled from the logs, from the beginning and end of the model build:
>
> 15/02/09 17:22:11 INFO scheduler.DAGScheduler: Job 42 finished: count
> at DecisionTreeMetadata.scala:111, took 0.077957 s ....
> 15/02/09 19:44:01 INFO scheduler.DAGScheduler: Job 7954 finished:
> count at DecisionTreeMetadata.scala:111, took 5.495166 s
>
> Any thoughts or advice, or even suggestions on where to dig for more info 
> would be welcome.
>
> thanks
> chris
>
> Christopher Thom
>
> QUANTIUM
> Level 25, 8 Chifley, 8-12 Chifley Square Sydney NSW 2000
>
> T: +61 2 8222 3577
> F: +61 2 9292 6444
>
> W: quantium.com.au<www.quantium.com.au>
>
> ________________________________
>
> linkedin.com/company/quantium<www.linkedin.com/company/quantium>
>
> facebook.com/QuantiumAustralia<www.facebook.com/QuantiumAustralia>
>
> twitter.com/QuantiumAU<www.twitter.com/QuantiumAU>
>
>
> The contents of this email, including attachments, may be confidential 
> information. If you are not the intended recipient, any use, disclosure or 
> copying of the information is unauthorised. If you have received this email 
> in error, we would be grateful if you would notify us immediately by email 
> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the 
> message from your system.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
> additional commands, e-mail: user-h...@spark.apache.org
>

Christopher Thom

QUANTIUM
Level 25, 8 Chifley, 8-12 Chifley Square
Sydney NSW 2000

T: +61 2 8222 3577
F: +61 2 9292 6444

W: quantium.com.au<www.quantium.com.au>

________________________________

linkedin.com/company/quantium<www.linkedin.com/company/quantium>

facebook.com/QuantiumAustralia<www.facebook.com/QuantiumAustralia>

twitter.com/QuantiumAU<www.twitter.com/QuantiumAU>


The contents of this email, including attachments, may be confidential 
information. If you are not the intended recipient, any use, disclosure or 
copying of the information is unauthorised. If you have received this email in 
error, we would be grateful if you would notify us immediately by email reply, 
phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the message from 
your system.

Reply via email to