I witness really weird behavior when loading the data from RDBMS.
I tried different approach for loading the data - I provided a partitioning
column for make partitioning parallelism:
val df_init = sqlContext.read.format("jdbc").options(
Map("url" -> Configuration.dbUrl,
"dbtabl
HI Talebzadeh,
sorry I forget to answer last part of your question:
At O/S level you should see many CoarseGrainedExecutorBackend through jps
each corresponding to one executor. Are they doing anything?
There is one worker with one executor bussy and the rest is almost idle:
PID USER PR
HI Talebzadeh,
we are using 6 worker machines - running.
We are reading the data through sqlContext (data frame) as it is suggested
in the documentation over the JdbcRdd
prop just specifies name, password, and driver class.
Right after this data load we register it as a temp table
val df_i
Hi Jakub,
Sounds like one executor. Can you point out:
1. The number of slaves/workers you are running
2. Are you using JDBC to read data in?
3. Do you register DF as temp table and if so have you cached temp table
Sounds like only one executor is active and the rest are sitting idele.
i have seen similar behavior in my standalone cluster, I tried to increase the
number of partitions and at some point it seems all the executors or worker
nodes start to make parallel connection to remote data store. But it would be
nice if someone could point us to some references on how to ma
Hello,
I have a spark cluster running in a single mode, master + 6 executors.
My application is reading a data from database via DataFrame.read then
there is a filtering of rows. After that I re-partition data and I wonder
why on the executors page of the driver UI I see RDD blocks all allocated