I performed a series of TeraGen jobs via spark-submit ( each job generated equal size dataset into different S3 buckets ) I noticed that some jobs were fast and some were slow.
Slow jobs always had many log prints like DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1.0, runningTasks: 1 ( or 2, etc.. ) Fast jobs always have few prints of those lines. Can someone explain me, why the number of those debug prints are vary for different executions of the same job? The more i see those prints - so the job is slower. Does someone experienced the same behavior? Thanks Gil.