Thanks Jeff.

Yep, that was helpful.

Btw, (i) icon has a broken link  (see highlighted part below) :



- it leads to a broken link
https://zeppelin.apache.org/docs//usage/interpreter/interpreter_binding_mode.html


What do you think about https://issues.apache.org/jira/browse/ZEPPELIN-3334
"Set spark.scheduler.pool to authenticated user name" ?
I still think it makes sense ..




-- 
Ruslan Dautkhanov

On Wed, Mar 14, 2018 at 6:32 PM, Jeff Zhang <zjf...@gmail.com> wrote:

>
> Globally shared mode means all the users shared the sparkcontext and also
> the same spark interpreter. That's why in this mode, code is executed
> sequentially, concurrency is not allowed here as there may be dependencies
> between paragraphs. Concurrency can not guaranteed the execution order.
>
> For your scenario, I think you can use scoped per user mode where all the
> users share the same sparkcontext but use different spark interpreter.
>
>
>
> ankit jain <ankitjain....@gmail.com>于2018年3月15日周四 上午7:25写道:
>
>> We are seeing the same PENDING behavior despite running Spark Interpreter
>> in "Isolated per User" - we expected one SparkContext to be created per
>> user and indeed did see multiple SparkSubmit processes spun up on Zeppelin
>> pod.
>>
>> But why go to PENDING if there are multiple contexts that can be run in
>> parallel? Is assumption of multiple SparkSubmit = multiple SparkContext
>> correct?
>>
>> Thanks
>> Ankit
>>
>> On Wed, Mar 14, 2018 at 4:12 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
>> wrote:
>>
>>> Looked at the code.. the only place Zeppelin handles
>>> spark.scheduler.pool is here -
>>>
>>> https://github.com/apache/zeppelin/blob/d762b5288536201d8a2964891c556e
>>> faa1bae867/spark/interpreter/src/main/java/org/apache/zeppelin/spark/
>>> SparkSqlInterpreter.java#L103
>>>
>>> I don't think it matches Spark documentation description that would
>>> allow multiple concurrent users to submit jobs independently.
>>> (each user's *thread* has to have different value for  *spark.scheduler.pool
>>> *)
>>>
>>> Filed https://issues.apache.org/jira/browse/ZEPPELIN-3334 to set
>>> *spark.scheduler.pool* to an authenticated user name.
>>>
>>> Other ideas?
>>>
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Wed, Mar 14, 2018 at 4:57 PM, Ruslan Dautkhanov <dautkha...@gmail.com
>>> > wrote:
>>>
>>>> Let's say we have a Spark interpreter set up as
>>>> " The interpreter will be instantiated *Globally *in *shared *process"
>>>>
>>>> When one user is using Spark interpreter,
>>>> another users that are trying to use the same interpreter,
>>>> getting PENDING until another user's code completes.
>>>>
>>>> Per Spark documentation, https://spark.apache.org/docs/
>>>> latest/job-scheduling.html
>>>>
>>>> " *within* each Spark application, multiple “jobs” (Spark actions) may
>>>>> be running concurrently if they were submitted by different threads
>>>>> ... /skip/
>>>>> threads. By “job”, in this section, we mean a Spark action (e.g. save,
>>>>>  collect) and any tasks that need to run to evaluate that action.
>>>>> Spark’s scheduler is fully thread-safe and supports this use case to 
>>>>> enable
>>>>> applications that serve multiple requests (e.g. queries for multiple 
>>>>> users).
>>>>> ... /skip/
>>>>> Without any intervention, newly submitted jobs go into a *default
>>>>> pool*, but jobs’ pools can be set by adding the *spark.scheduler.pool*
>>>>>  “local property” to the SparkContext in the thread that’s submitting
>>>>> them.    "
>>>>
>>>>
>>>> So Spark allows multiple users to use the same shared spark context..
>>>>
>>>> Two quick questions:
>>>> 1. Why concurrent users are getting PENDING in Zeppelin?
>>>> 2. Does Zeppelin set *spark.scheduler.pool* accordingly as described
>>>> above?
>>>>
>>>> PS.
>>>> We have set following Spark interpreter settings:
>>>> - zeppelin.spark.concurrentSQL= true
>>>> - spark.scheduler.mode = FAIR
>>>>
>>>>
>>>> Thank you,
>>>> Ruslan Dautkhanov
>>>>
>>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Ankit.
>>
>

Reply via email to