answer for Gourav Sengupta

I want to use same spark application because i want to work as a FIFO
scheduler. My problem is that i have many jobs(not so big) and if i run an
application for every job my cluster will split resources as a FAIR
scheduler(it's what i observe, maybe i'm wrong) and exist the possibility
to create bottleneck effect. The start time isn't a problem for me, because
it isn't a real-time application.

I need a business solution, that's the reason why i can't use code from
github.

Thanks!

2017-02-07 19:55 GMT+02:00 Gourav Sengupta <gourav.sengu...@gmail.com>:

> Hi,
>
> May I ask the reason for using the same spark application? Is it because
> of the time it takes in order to start a spark context?
>
> On another note you may want to look at the number of contributors in a
> github repo before choosing a solution.
>
>
> Regards,
> Gourav
>
> On Tue, Feb 7, 2017 at 5:26 PM, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> Spark jobserver or Livy server are the best options for pure technical
>> API.
>> If you want to publish business API you will probably have to build you
>> own app like the one I wrote a year ago https://github.com/elppc/akka-
>> spark-experiments
>> It combines Akka actors and a shared Spark context to serve concurrent
>> subsecond jobs
>>
>>
>> 2017-02-07 15:28 GMT+01:00 ayan guha <guha.a...@gmail.com>:
>>
>>> I think you are loking for livy or spark  jobserver
>>>
>>> On Wed, 8 Feb 2017 at 12:37 am, Cosmin Posteuca <
>>> cosmin.poste...@gmail.com> wrote:
>>>
>>>> I want to run different jobs on demand with same spark context, but i
>>>> don't know how exactly i can do this.
>>>>
>>>> I try to get current context, but seems it create a new spark
>>>> context(with new executors).
>>>>
>>>> I call spark-submit to add new jobs.
>>>>
>>>> I run code on Amazon EMR(3 instances, 4 core & 16GB ram / instance),
>>>> with yarn as resource manager.
>>>>
>>>> My code:
>>>>
>>>> val sparkContext = SparkContext.getOrCreate()
>>>> val content = 1 to 40000
>>>> val result = sparkContext.parallelize(content, 5)
>>>> result.map(value => value.toString).foreach(loop)
>>>>
>>>> def loop(x: String): Unit = {
>>>>    for (a <- 1 to 30000000) {
>>>>
>>>>    }
>>>> }
>>>>
>>>> spark-submit:
>>>>
>>>> spark-submit --executor-cores 1 \
>>>>              --executor-memory 1g \
>>>>              --driver-memory 1g \
>>>>              --master yarn \
>>>>              --deploy-mode cluster \
>>>>              --conf spark.dynamicAllocation.enabled=true \
>>>>              --conf spark.shuffle.service.enabled=true \
>>>>              --conf spark.dynamicAllocation.minExecutors=1 \
>>>>              --conf spark.dynamicAllocation.maxExecutors=3 \
>>>>              --conf spark.dynamicAllocation.initialExecutors=3 \
>>>>              --conf spark.executor.instances=3 \
>>>>
>>>> If i run twice spark-submit it create 6 executors, but i want to run
>>>> all this jobs on same spark application.
>>>>
>>>> How can achieve adding jobs to an existing spark application?
>>>>
>>>> I don't understand why SparkContext.getOrCreate() don't get existing
>>>> spark context.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Cosmin P.
>>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>

Reply via email to