Hi Jakub,

Sounds like one executor. Can you point out:


   1. The number of slaves/workers you are running
   2. Are you using JDBC to read data in?
   3. Do you register DF as temp table and if so have you cached temp table

Sounds like only one executor is active and the rest are sitting idele.

At O/S level you should see many CoarseGrainedExecutorBackend through jps
each corresponding to one executor. Are they doing anything?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 14 July 2016 at 17:18, Jakub Stransky <stransky...@gmail.com> wrote:

> Hello,
>
> I have a spark  cluster running in a single mode, master + 6 executors.
>
> My application is reading a data from database via DataFrame.read then
> there is a filtering of rows. After that I re-partition data and I wonder
> why on the executors page of the driver UI I see RDD blocks all allocated
> still on single executor machine
>
> [image: Inline images 1]
> As highlighted on the picture above. I did expect that after re-partition
> the data will be shuffled across cluster but that is obviously not
> happening here.
>
> I can understand that database read is happening in non-parallel fashion
> but re-partition  should fix it as far as I understand.
>
> Could someone experienced clarify that?
>
> Thanks
>

Reply via email to