Standalone cluster node utilization

Jakub Stransky Thu, 14 Jul 2016 09:19:21 -0700

Hello,

I have a spark  cluster running in a single mode, master + 6 executors.


My application is reading a data from database via DataFrame.read then
there is a filtering of rows. After that I re-partition data and I wonder
why on the executors page of the driver UI I see RDD blocks all allocated
still on single executor machine

[image: Inline images 1]
As highlighted on the picture above. I did expect that after re-partition
the data will be shuffled across cluster but that is obviously not
happening here.

I can understand that database read is happening in non-parallel fashion
but re-partition  should fix it as far as I understand.

Could someone experienced clarify that?

Thanks

Standalone cluster node utilization

Reply via email to