I have a process in Spark Streamin which lasts 2 seconds. When I check where the time is spent I see about 0.8s-1s in processing time although the global time is 2s. This one second is spent in the driver. I reviewed the code which is executed by the driver and I commented some of this code with the same result. So I don't have any idea where the time is spent.
Righ now, I'm executing in client mode from one the node inside the cluster so I can't set the number the cores to the driver (although I don't think that it's going to make the difference) . How could I know where the driver is spending the time? I'm not sure if it possible to improve the performance in this point or that second is spent scheduling the graph of each microbatch mainly