I'm working with spark 0.9.0 on cdh5.
I'm running a spark application written in java in yarn-client mode.
Cause of the OP installed on the cluster I need to run the application using
the hdfs user, otherwise I have a permission problem and getting the following
error:
I'm trying to create an RDD from multiple scans.
I tried to set the configuration this way:
Configuration config = HBaseConfiguration.create();
config.setStrings(MultiTableInputFormat.SCANS,scanStrings);
And creating each scan string in the array scanStrings this way:
Scan scan = new Scan();
YARN also have this scheduling option.
The problem is all of our applications have the same flow where the first
stage is the heaviest and the rest are very small.
The problem is when some request (application) start to run on the same time,
the first stage of all is schedule in parallel, and