Moon - I would be curious on your thoughts on my email from April 12th.
John On Tue, Apr 12, 2016 at 7:11 AM, John Omernik <j...@omernik.com> wrote: > I would actually argue that if the user doesn't have access to the same or > a similar interpreter.json file, than notebook file portability is a moot > point. For example, if I setup %spark or %jdbc in my environment, create a > notebook, that notebook is not any more or less portable than if I had > %myspark or %drill (a jdbc interpreter). Mainly, because if someone tries > to open that notebook, and they don't have my setup of %spark or of %jdbc, > they can't run the notebook. If we could allow the user to create an alias > for an instance of an interpreter, and that alias information was stored in > interpreter.json, then the portability of the notebook would essentially > that the same. > > Said another way: > > Static interpreter invocation: (%jdbc, %pysaprk, %psql): > - This notebook is 100% dependent on the interpreter.json in order to run. > %jdbc may point to Drill, %pyspark may point to an authenticated YARN > instance (specific to the user/org), %psql may point to an authenticated > Postgres instance unique to the org/user. Without interpreter.json, this > notebook is not portable. > > Aliases for interpreter invocation stored in interpreter.json (%drill -> > jdbc with settings, %datesciencespark -> pyspark for the data science > group, %entdw -> postgres server, enterprise datawarehouse) > > - Thus notebook is still 100% dependent on the interpreter.json file in > order to run. There is no more or less dependance on the interpreter.json > (if these aliases are stored there) then there is if Zeppelin is using > static interpreter invocation, thus portability is not a benefit of the > static method, and the aliased method can provide a good deal of analyst > agility/definition in a multi data set/source environment. > > > My thought is we should people to create new interpreters of known types, > and on creation of these interpreters allow the invocation to be stored in > the interpreter.json. Also, if a new interpreter is registered, it would > follow the same interpreter group methodology. Thus if I setup a new %spark > to be %entspark, then the sub interpreters (pyspark, sparksql etc) can be > there and have access to the master entspark, and also can be renamed. so > that sub interpreter can be renamed, and the access it has to interpreter > group is based on the parent child relationship, not just by name... > > Thoughts? > > > > > > > On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wangzhong....@gmail.com> > wrote: > >> Thanks moon - it is good to know the ideas behind the design. It makes a >> lot more sense to use system-defined identifiers in order to make the >> notebook portable. >> >> Currently, I can name the interpreter in the WebUI, but actually the name >> doesn't help to distinguish between my spark interpreters, which is quite >> confusing to me. I am not sure whether this is a better way: >> -- >> 1. the UI generates the default identifier for the first spark >> interpreter, which is %spark >> 2. when the user creates another spark interpreter, the UI asks the users >> to provide a user-defined identifier >> >> Zhong >> >> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <m...@apache.org> wrote: >> >>> In the initial stage of development, there were discussion about >>> %xxx, where xxx should be user defined interpreter identifier, or should >>> be a static interpreter identifier. >>> >>> And decided to go later one. Because we wanted keep notebook file >>> portable. i.e. Let run imported note.json file from other Zeppelin instance >>> without (or minimum) modification. >>> >>> if we use user defined identifier, running imported notebook will not be >>> very simple. This is why %xxx is not using user defined interpreter >>> identifier at the moment. >>> >>> If you have any other thoughts, ideas, please feel free to share. >>> >>> Thanks, >>> moon >>> >>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wangzhong....@gmail.com> >>> wrote: >>> >>>> Thanks, Moon! I got it worked. The reason why it didn't work is that I >>>> tried to use both of the spark interpreters inside one notebook. I think I >>>> can create different notebooks for each interpreters, but it would be great >>>> if we could use "%xxx", where xxx is the user defined interpreter >>>> identifier, to identify different interpreters for different paragraphs. >>>> >>>> Besides, because currently both of the interpreters are using "spark" >>>> as the identifier, they share the same log file. I am not sure whether >>>> there are other cases they interfere with each other. >>>> >>>> Thanks, >>>> Zhong >>>> >>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <m...@apache.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> Once you create another spark interpreter in Interpreter menu of GUI, >>>>> then each notebook should able to select and use it (setting icon on >>>>> top right corner of each notebook). >>>>> >>>>> If it does not work, could you find error message on the log file? >>>>> >>>>> Thanks, >>>>> moon >>>>> >>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wangzhong....@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi zeppelin pilots, >>>>>> >>>>>> I am trying to run multiple spark interpreters in the same Zeppelin >>>>>> instance. This is very helpful if the data comes from multiple spark >>>>>> clusters. >>>>>> >>>>>> Another useful use case is that, run one instance in cluster mode, >>>>>> and another in local mode. This will significantly boost the performance >>>>>> of >>>>>> small data analysis. >>>>>> >>>>>> Is there anyway to run multiple spark interpreters? I tried to create >>>>>> another spark interpreter with a different identifier, which is allowed >>>>>> in >>>>>> UI. But it doesn't work (shall I file a ticket?) >>>>>> >>>>>> I am now trying running multiple sparkContext in the same spark >>>>>> interpreter. >>>>>> >>>>>> Zhong >>>>>> >>>>>> >>>>>> >>>> >> >