Re: why zeppelin SparkInterpreter use FIFOScheduler

Pranav Kumar Agarwal Thu, 03 Sep 2015 00:13:42 -0700

It had nothing to do the changes related to completion code. The issuewas reproducible on master also.

Its due to the recent fix for ZEPPELIN-173

On one of our environment the hostname didn't returned the domain nameafter the hostname, however since the query coming from the browserincluded the hostname.domain.name. Basically the equality check inNotebookServer.java at checkOrigin method forcurrentHost.equals(sourceHost) was failing.

I think the code should fetch getCanonicalHostName also and try it asone of the combination before returning false. Since this was not muchof a concern we just commented the newly added method checkOrigin in ourlocal copy.


Regards,
-Pranav.

On 02/09/15 11:59 am, moon soo Lee wrote:

Hi,

I'm not sure what could be wrong.
can you see any existing notebook?

Best,
moon

On Mon, Aug 31, 2015 at 8:48 PM Piyush Mukati (Data Platform)<piyush.muk...@flipkart.com <mailto:piyush.muk...@flipkart.com>> wrote:


    Hi,
    we have passed the InterpreterContext to  completion() , it is
    working good on my local dev setup.
    but after
    mvn  clean package  -P build-distr  -Pspark-1.4
    -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
    I copied zeppelin-0.6.0-incubating-SNAPSHOT.tar.gz to some other
    machine,
    while running from there it always shows disconnected and no
    notebook are shown, even i am not able to create any notebook as
    well.

    Screenshot 2015-09-01 09.14.54.png
    i am not seeing anything in logs. can anyone please suggest me how
    can i further debug into it.
    thanks.

    On Wed, Aug 26, 2015 at 8:27 PM, moon soo Lee <m...@apache.org
    <mailto:m...@apache.org>> wrote:

        Hi Pranav,

        Thanks for sharing the plan.
        I think passing InterpreterContext to completion()  make sense.
        Although it changes interpreter api, changing now looks better
        than later.

        Thanks.
        moon

        On Tue, Aug 25, 2015 at 11:22 PM Pranav Kumar Agarwal
        <praag...@gmail.com <mailto:praag...@gmail.com>> wrote:

            Hi Moon,

            > I think releasing SparkIMain and related objects
            By packaging I meant to ask what is the process to
            "release SparkIMain
            and related objects"? for Zeppelin's code uptake?

            I have one more question:
            Most the changes to allow SparkInterpreter support
            ParallelScheduler are
            implemented but I'm struggling with the completion
            feature. Since I have
            SparkIMain interpreter for each notebook, completion
            functionality is
            not working as expected cause the completion method
            doesn't have
            InterpreterContext. I need to be able to pull notebook
            specific
            SparkIMain interpreter to return correct completion
            results, and for
            that I need to know the notbook-id at the time of
            completion call.

            I'm planning to change the Interpreter.java abstract
            method completion
            to pass InterpreterContext along with buffer and cursor
            location. This
            will require refactoring all the Interpreter's. It's a
            change in the
            contract, so thought will run with you before embarking on
            it...

            Please let me know your thoughts.

            Regards,
            -Pranav.

            On 18/08/15 8:04 am, moon soo Lee wrote:
            > Could you explain little bit more about package changes
            you mean?
            >
            > Thanks,
            > moon
            >
            > On Mon, Aug 17, 2015 at 10:27 AM Pranav Agarwal
            <praag...@gmail.com <mailto:praag...@gmail.com>
            > <mailto:praag...@gmail.com <mailto:praag...@gmail.com>>>
            wrote:
            >
            >     Any thoughts on how to package changes related to Spark?
            >
            >     On 17-Aug-2015 7:58 pm, "moon soo Lee"
            <m...@apache.org <mailto:m...@apache.org>
            >     <mailto:m...@apache.org <mailto:m...@apache.org>>>
            wrote:
            >
            >         I think releasing SparkIMain and related objects
            after
            >         configurable inactivity would be good for now.
            >
            >         About scheduler, I can help implementing such
            scheduler.
            >
            >         Thanks,
            >         moon
            >
            >         On Sun, Aug 16, 2015 at 11:54 PM Pranav Kumar
            Agarwal
            >         <praag...@gmail.com <mailto:praag...@gmail.com>
            <mailto:praag...@gmail.com <mailto:praag...@gmail.com>>>
            wrote:
            >
            >             Hi Moon,
            >
            >             Yes, the notebookid comes from
            InterpreterContext. At the
            >             moment destroying SparkIMain on deletion of
            notebook is
            >             not handled. I think SparkIMain is a
            lightweight object,
            >             do you see a concern having these objects in
            a map? One
            >             possible option could be to destroy notebook
            related
            >             objects when the inactivity on a notebook is
            greater than
            >             say 8 hours.
            >
            >
            >>             >> 4. Build a queue inside interpreter to
            allow only one
            >>             paragraph execution
            >>             >> at a time per notebook.
            >>
            >>             One downside of this approach is, GUI will
            display
            >>             RUNNING instead of PENDING for jobs inside
            of queue in
            >>             interpreter.
            >             Yes that's an good point. Having a scheduler
            at Zeppelin
            >             server to build a scheduler that is parallel
            across
            >             notebook's and FIFO across paragraph's will
            be nice. Is
            >             there any plan for having such a scheduler?
            >
            >             Regards,
            >             -Pranav.
            >
            >
            >             On 17/08/15 5:38 am, moon soo Lee wrote:
            >>             Pranav, proposal looks awesome!
            >>
            >>             I have a question and feedback,
            >>
            >>             You said you tested 1,2 and 3. To create
            SparkIMain per
            >>             notebook, you need information of notebook
            id. Did you
            >>             get it from InterpreterContext?
            >>             Then how did you handle destroying of
            SparkIMain (when
            >>             notebook is deleting)?
            >>             As far as i know, interpreter not able to
            get information
            >>             of notebook deletion.
            >>
            >>             >> 4. Build a queue inside interpreter to
            allow only one
            >>             paragraph execution
            >>             >> at a time per notebook.
            >>
            >>             One downside of this approach is, GUI will
            display
            >>             RUNNING instead of PENDING for jobs inside
            of queue in
            >>             interpreter.
            >>
            >>             Best,
            >>             moon
            >>
            >>             On Sun, Aug 16, 2015 at 12:55 AM IT CTO
            >>             <goi....@gmail.com
            <mailto:goi....@gmail.com> <mailto:goi....@gmail.com
            <mailto:goi....@gmail.com>>> wrote:
            >>
            >>                 +1 for "to re-factor the Zeppelin
            architecture so
            >>                 that it can handle multi-tenancy easily"
            >>
            >>                 On Sun, Aug 16, 2015 at 9:47 AM DuyHai Doan
            >>                 <doanduy...@gmail.com
            <mailto:doanduy...@gmail.com> <mailto:doanduy...@gmail.com
            <mailto:doanduy...@gmail.com>>>
            >>                 wrote:
            >>
            >>                     Agree with Joel, we may think to
            re-factor the
            >>                     Zeppelin architecture so that it
            can handle
            >>                     multi-tenancy easily. The technical
            solution
            >>                     proposed by Pranav is great but it
            only applies
            >>                     to Spark. Right now, each
            interpreter has to
            >>                     manage multi-tenancy its own way.
            Ultimately
            >>                     Zeppelin can propose a multi-tenancy
            >>                     contract/info (like UserContext,
            similar to
            >>  InterpreterContext) so that each interpreter can
            >>                     choose to use or not.
            >>
            >>
            >>                     On Sun, Aug 16, 2015 at 3:09 AM,
            Joel Zambrano
            >>                     <djo...@gmail.com
            <mailto:djo...@gmail.com> <mailto:djo...@gmail.com
            <mailto:djo...@gmail.com>>> wrote:
            >>
            >>                         I think while the idea of
            running multiple
            >>                         notes simultaneously is great.
            It is really
            >>                         dancing around the lack of true
            multi user
            >>                         support in Zeppelin. While the
            proposed
            >>                         solution would work if the
            applications
            >>                         resources are those of the
            whole cluster, if
            >>                         the app is limited (say they
            are 8 cores of
            >>                         16, with some distribution in
            memory) then
            >>  potentially your note can hog all the
            >>                         resources and the scheduler
            will have to
            >>                         throttle all other executions
            leaving you
            >>                         exactly where you are now.
            >>                         While I think the solution is a
            good one,
            >>                         maybe this question makes us
            think in adding
            >>                         true multiuser support.
            >>                         Where we isolate resources
            (cluster and the
            >>                         notebooks themselves), have
            separate
            >>  login/identity and (I don't know if it's
            >>                         possible) share the same context.
            >>
            >>                         Thanks,
            >>                         Joel
            >>
            >>                         > On Aug 15, 2015, at 1:58 PM,
            Rohit Agarwal
            >>                         <mindpri...@gmail.com
            <mailto:mindpri...@gmail.com>
            >>  <mailto:mindpri...@gmail.com
            <mailto:mindpri...@gmail.com>>> wrote:
            >>                         >
            >>                         > If the problem is that
            multiple users have
            >>                         to wait for each other while
            >>                         > using Zeppelin, the solution
            already
            >>                         exists: they can create a new
            >>                         > interpreter by going to the
            interpreter
            >>                         page and attach it to their
            >>                         > notebook - then they don't
            have to wait for
            >>                         others to submit their job.
            >>                         >
            >>                         > But I agree, having
            paragraphs from one
            >>                         note wait for paragraphs from other
            >>                         > notes is a confusing default.
            We can get
            >>                         around that in two ways:
            >>                         >
            >>                         >  1. Create a new interpreter
            for each note
            >>                         and attach that interpreter to
            >>                         >  that note. This approach
            would require the least amount
            >>                         of code changes but
            >>                         >  is resource heavy and
            doesn't let you
            >>                         share Spark Context between
            different
            >>                         >  notes.
            >>                         >  2. If we want to share the
            Spark Context
            >>                         between different notes, we can
            >>                         >  submit jobs from different
            notes into
            >>  different fairscheduler pools (
            >>                         >
            >>
            
https://spark.apache.org/docs/1.4.0/job-scheduling.html#scheduling-within-an-application).
            >>                         >  This can be done by
            submitting jobs from
            >>  different notes in different
            >>                         >  threads. This will make sure
            that jobs
            >>                         from one note are run sequentially
            >>                         >  but jobs from different
            notes will be
            >>                         able to run in parallel.
            >>                         >
            >>                         > Neither of these options
            require any change
            >>                         in the Spark code.
            >>                         >
            >>                         > --
            >>                         > Thanks & Regards
            >>                         > Rohit Agarwal
            >>                         >
            https://www.linkedin.com/in/rohitagarwal003
            >>                         >
            >>                         > On Sat, Aug 15, 2015 at 12:01
            PM, Pranav
            >>                         Kumar Agarwal
            <praag...@gmail.com <mailto:praag...@gmail.com>
            >>                         <mailto:praag...@gmail.com
            <mailto:praag...@gmail.com>>>

            >>                         > wrote:
            >>                         >
            >>  >> If someone can share about the idea of
            >>                         sharing single SparkContext through
            >>  >>> multiple SparkILoop safely, it'll be
            >>                         really helpful.
            >>  >> Here is a proposal:
            >>  >> 1. In Spark code, change SparkIMain.scala
            >>                         to allow setting the virtual
            >>  >> directory. While creating new instances of
            >>  SparkIMain per notebook from
            >>  >> zeppelin spark interpreter set all the
            >>  instances of SparkIMain to the same
            >>  >> virtual directory.
            >>  >> 2. Start HTTP server on that virtual
            >>  directory and set this HTTP server in
            >>  >> Spark Context using classserverUri method
            >>  >> 3. Scala generated code has a notion of
            >>  packages. The default package name
            >>  >> is "line$<linenumber>". Package name can
            >>                         be controlled using System
            >>  >> Property scala.repl.name.line. Setting
            >>                         this property to "notebook id"
            >>  >> ensures that code generated by individual
            >>  instances of SparkIMain is
            >>  >> isolated from other instances of SparkIMain
            >>  >> 4. Build a queue inside interpreter to
            >>                         allow only one paragraph execution
            >>  >> at a time per notebook.
            >>  >>
            >>  >> I have tested 1, 2, and 3 and it seems to
            >>                         provide isolation across
            >>  >> classnames. I'll work towards submitting a
            >>                         formal patch soon - Is there any
            >>  >> Jira already for the same that I can
            >>                         uptake? Also I need to understand:
            >>  >> 1. How does Zeppelin uptake Spark fixes?
            >>                         OR do I need to first work
            >>  >> towards getting Spark changes merged in
            >>                         Apache Spark github?
            >>  >>
            >>  >> Any suggestions on comments on the
            >>  proposal are highly welcome.
            >>  >>
            >>  >> Regards,
            >>  >> -Pranav.
            >>  >>
            >>  >>> On 10/08/15 11:36 pm, moon soo Lee wrote:
            >>  >>>
            >>  >>> Hi piyush,
            >>  >>>
            >>  >>> Separate instance of SparkILoop
            >>  SparkIMain for each notebook while
            >>  >>> sharing the SparkContext sounds great.
            >>  >>>
            >>  >>> Actually, i tried to do it, found problem
            >>                         that multiple SparkILoop could
            >>  >>> generates the same class name, and spark
            >>  executor confuses classname since
            >>  >>> they're reading classes from single
            >>  SparkContext.
            >>  >>>
            >>  >>> If someone can share about the idea of
            >>                         sharing single SparkContext
            >>  >>> through multiple SparkILoop safely, it'll
            >>                         be really helpful.
            >>  >>>
            >>  >>> Thanks,
            >>  >>> moon
            >>  >>>
            >>  >>>
            >>  >>> On Mon, Aug 10, 2015 at 1:21 AM Piyush
            >>                         Mukati (Data Platform) <
            >>  >>> piyush.muk...@flipkart.com
            <mailto:piyush.muk...@flipkart.com>
            >>  <mailto:piyush.muk...@flipkart.com
            <mailto:piyush.muk...@flipkart.com>>

>><mailto:piyush.muk...@flipkart.com

            <mailto:piyush.muk...@flipkart.com>

            >>  <mailto:piyush.muk...@flipkart.com
            <mailto:piyush.muk...@flipkart.com>>>> wrote:
            >>  >>>
            >>  >>>    Hi Moon,
            >>  >>>    Any suggestion on it, have to wait lot
            >>                         when multiple people  working
            >>  >>> with spark.
            >>  >>>    Can we create separate instance of
            >> SparkILoop SparkIMain and
            >>  >>> printstrems  for each notebook while
            >>                         sharing theSparkContext
            >>  >>> ZeppelinContext SQLContext and
            >>  DependencyResolver and then use parallel
            >>  >>> scheduler ?
            >>  >>> thanks
            >>  >>>
            >>  >>> -piyush
            >>  >>>
            >>  >>>    Hi Moon,
            >>  >>>
            >>  >>>    How about tracking dedicated
            >>  SparkContext for a notebook in Spark's
            >>  >>> remote interpreter - this will allow
            >>  multiple users to run their spark
            >>  >>> paragraphs in parallel. Also, within a
            >>  notebook only one paragraph is
            >>  >>> executed at a time.
            >>  >>>
            >>  >>> Regards,
            >>  >>> -Pranav.
            >>  >>>
            >>  >>>
            >>  >>>> On 15/07/15 7:15 pm, moon soo Lee wrote:
            >>  >>>> Hi,
            >>  >>>>
            >>  >>>> Thanks for asking question.
            >>  >>>>
            >>  >>>> The reason is simply because of it is
            >>                         running code statements. The
            >>  >>>> statements can have order and
            >>  dependency. Imagine i have two
            >>  >>> paragraphs
            >>  >>>>
            >>  >>>> %spark
            >>  >>>> val a = 1
            >>  >>>>
            >>  >>>> %spark
            >>  >>>> print(a)
            >>  >>>>
            >>  >>>> If they're not running one by one, that
            >>                         means they possibly runs in
            >>  >>>> random order and the output will be
            >>                         always different. Either '1' or
            >>  >>>> 'val a can not found'.
            >>  >>>>
            >>  >>>> This is the reason why. But if there are
            >>                         nice idea to handle this
            >>  >>>> problem i agree using parallel scheduler
            >>                         would help a lot.
            >>  >>>>
            >>  >>>> Thanks,
            >>  >>>> moon
            >>  >>>> On 2015년 7월 14일 (화) at 오후 7:59
            >>                         linxi zeng
            >>  >>>> <linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>>>
            >>  >>> <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>>>>>
            >>  >>> wrote:
            >>  >>>>
            >>  >>>> any one who have the same question with
            >>                         me? or this is not a
            >>  >>> question?
            >>  >>>>
            >>  >>>> 2015-07-14 11:47 GMT+08:00 linxi zeng
            >>                         <linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>>
            >>  >>> <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>>>
            >>  >>>> <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>> <mailto:
            >>  >>> linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>
            >>  <mailto:linxizeng0...@gmail.com
            <mailto:linxizeng0...@gmail.com>>>>>:
            >>  >>>>
            >>  >>>>     hi, Moon:
            >>  >>>>        I notice that the getScheduler
            >>  function in the
            >>  >>>> SparkInterpreter.java return a
            >>  FIFOScheduler which makes the
            >>  >>>>     spark interpreter run spark job one
            >>                         by one. It's not a good
            >>  >>>>     experience when couple of users do
            >>                         some work on zeppelin at
            >>  >>>>     the same time, because they have to
            >>                         wait for each other.
            >>  >>>>     And at the same time,
            >>  SparkSqlInterpreter can chose what
            >>  >>>>     scheduler to use by
            >>  "zeppelin.spark.concurrentSQL".
            >>  >>>>     My question is, what kind of
            >>  consideration do you based on
            >>  >>> to
            >>  >>>>     make such a decision?
            >>  >>>
            >>  >>>
            >>  >>>
            >>  >>>
            >>  >>>
            >>
             
------------------------------------------------------------------------------------------------------------------------------------------
            >>  >>>
            >>  >>>    This email and any files transmitted
            >>                         with it are confidential and
            >>  >>> intended solely for the use of the
            >>  individual or entity to whom
            >>  >>>    they are addressed. If you have
            >>  received this email in error
            >>  >>> please notify the system manager. This
            >>                         message contains
            >>  >>> confidential information and is intended
            >>                         only for the individual
            >>  >>> named. If you are not the named addressee
            >>                         you should not
            >>  >>> disseminate, distribute or copy this
            >>                         e-mail. Please notify the
            >>  >>> sender immediately by e-mail if you have
            >>  received this e-mail by
            >>  >>> mistake and delete this e-mail from your
            >>                         system. If you are not
            >>  >>>    the intended recipient you are
            >>  notified that disclosing, copying,
            >>  >>> distributing or taking any action in
            >>  reliance on the contents of
            >>  >>>    this information is strictly
            >>  prohibited. Although Flipkart has
            >>  >>> taken reasonable precautions to ensure no
            >>                         viruses are present in
            >>  >>>    this email, the company cannot accept
            >>  responsibility for any loss
            >>  >>>    or damage arising from the use of this
            >>                         email or attachments
            >>  >>
            >>
            >>
            >



    
------------------------------------------------------------------------------------------------------------------------------------------

    This email and any files transmitted with it are confidential and
    intended solely for the use of the individual or entity to whom
    they are addressed. If you have received this email in error
    please notify the system manager. This message contains
    confidential information and is intended only for the individual
    named. If you are not the named addressee you should not
    disseminate, distribute or copy this e-mail. Please notify the
    sender immediately by e-mail if you have received this e-mail by
    mistake and delete this e-mail from your system. If you are not
    the intended recipient you are notified that disclosing, copying,
    distributing or taking any action in reliance on the contents of
    this information is strictly prohibited. Although Flipkart has
    taken reasonable precautions to ensure no viruses are present in
    this email, the company cannot accept responsibility for any loss
    or damage arising from the use of this email or attachments

Re: why zeppelin SparkInterpreter use FIFOScheduler

Reply via email to