Hello,

I have a spark  cluster running in a single mode, master + 6 executors.

My application is reading a data from database via DataFrame.read then
there is a filtering of rows. After that I re-partition data and I wonder
why on the executors page of the driver UI I see RDD blocks all allocated
still on single executor machine

[image: Inline images 1]
As highlighted on the picture above. I did expect that after re-partition
the data will be shuffled across cluster but that is obviously not
happening here.

I can understand that database read is happening in non-parallel fashion
but re-partition  should fix it as far as I understand.

Could someone experienced clarify that?

Thanks

Reply via email to