I'm playing around with dynamic allocation in spark-1.5.0, with the FAIR scheduler, so I can define a long-running application capable of executing multiple simultaneous spark jobs.
The kind of jobs that I'm running do not benefit from more than 4 cores, but I want my application to be able to take several times that in order to run multiple jobs at the same time. I suppose my question is more basic: How can I limit the number of cores used to load an RDD or DataFrame? I can immediately repartition or coalesce my RDD or DataFrame to 4 partitions after I load it, but that doesn't stop Spark from using more cores to load it. Does it make sense what I am trying to accomplish, and is there any way to do it? - Philip