Hi Guys,
Could anyone help me understanding the logs below? Why the result in the
second log is 0?
Thanks Guys
14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job
1392919557000 ms.0 from job set of time 1392919557000 ms
14/02/20 19:06:00 INFO JobScheduler: Total delay: 3.185 s for time
1392919557000 ms (execution: 3.167 s)
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time
1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time
1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time
1392919557000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at
NetworkWordCount.scala:87
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job
1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time
1392919557000 ms to file
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 812 (combineByKey
at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO DAGScheduler: Got job 91 (first at
NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 181 (first at
NetworkWordCount.scala:87)
14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 182
(MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42),
which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 2 missing tasks from
Stage 182 (MapPartitionsRDD[812] at combineByKey at
ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 182.0 with 2 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:1 as TID 609
on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:1 as 3023
bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:0 as TID 610
on executor 0: computer1.ant-net (NODE_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:0 as 3485
bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 609 in 17 ms on
computer1.ant-net (progress: 0/2)
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 1)
14/02/20 19:06:00 INFO BlockManagerMasterActor$BlockManagerInfo: Added
input-0-1392919527400 in memory on computer1.ant-net:41142 (size: 2018.6
KB, free: 387.3 MB)
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 610 in 67 ms on
computer1.ant-net (progress: 1/2)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 182.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 182 (combineByKey at
ShuffledDStream.scala:42) finished in 0.080 s
14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 181)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO CheckpointWriter: Deleting
hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919554000.bk
14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 181: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 181
(MappedRDD[815] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO CheckpointWriter: Checkpoint for time
1392919557000 ms saved to file
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000', took
3270 bytes and 102 ms
14/02/20 19:06:00 INFO DStreamGraph: Clearing checkpoint data for time
1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Cleared checkpoint data for time
1392919557000 ms
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from
Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 181.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 181.0:0 as TID 611
on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 181.0:0 as 2057
bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 90 to sp...@computer1.ant-net:47226
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses
for shuffle 90 is 146 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 611 in 25 ms on
computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 181.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(181, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 181 (first at
NetworkWordCount.scala:87) finished in 0.027 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at
NetworkWordCount.scala:87, took 0.133625862 s
118967 (Total of words in a RDD)
#######################################################################################
14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job
1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time
1392919558000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time
1392919558000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time
1392919558000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at
NetworkWordCount.scala:87
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time
1392919558000 ms to file
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919558000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 821 (combineByKey
at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO JobScheduler: Total delay: 2.322 s for time
1392919558000 ms (execution: 0.134 s)
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job
1392919559000 ms.0 from job set of time 1392919559000 ms
14/02/20 19:06:00 INFO DAGScheduler: Got job 92 (first at
NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 183 (first at
NetworkWordCount.scala:87)
14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 184)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 184)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 184
(MapPartitionsRDD[821] at combineByKey at ShuffledDStream.scala:42),
which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from
Stage 184 (MapPartitionsRDD[821] at combineByKey at
ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 184.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 184.0:0 as TID 612
on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 184.0:0 as 3024
bytes in 1 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 612 in 17 ms on
computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 184.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(184, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 184 (combineByKey at
ShuffledDStream.scala:42) finished in 0.018 s
14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 183)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 183: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 183
(MappedRDD[824] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from
Stage 183 (MappedRDD[824] at map at MappedDStream.scala:35)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 183.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 183.0:0 as TID 613
on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 183.0:0 as 2057
bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 91 to sp...@computer1.ant-net:47226
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses
for shuffle 91 is 137 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 613 in 23 ms on
computer1.ant-net (progress: 0/1)
14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 183.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(183, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 183 (first at
NetworkWordCount.scala:87) finished in 0.026 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at
NetworkWordCount.scala:87, took 0.072442522 s
0 (Total of words in a RDD)
--
Informativa sulla Privacy: http://www.unibs.it/node/8155