I would actually argue that if the user doesn't have access to the same or a similar interpreter.json file, than notebook file portability is a moot point. For example, if I setup %spark or %jdbc in my environment, create a notebook, that notebook is not any more or less portable than if I had %myspark or %drill (a jdbc interpreter). Mainly, because if someone tries to open that notebook, and they don't have my setup of %spark or of %jdbc, they can't run the notebook. If we could allow the user to create an alias for an instance of an interpreter, and that alias information was stored in interpreter.json, then the portability of the notebook would essentially that the same.
Said another way: Static interpreter invocation: (%jdbc, %pysaprk, %psql): - This notebook is 100% dependent on the interpreter.json in order to run. %jdbc may point to Drill, %pyspark may point to an authenticated YARN instance (specific to the user/org), %psql may point to an authenticated Postgres instance unique to the org/user. Without interpreter.json, this notebook is not portable. Aliases for interpreter invocation stored in interpreter.json (%drill -> jdbc with settings, %datesciencespark -> pyspark for the data science group, %entdw -> postgres server, enterprise datawarehouse) - Thus notebook is still 100% dependent on the interpreter.json file in order to run. There is no more or less dependance on the interpreter.json (if these aliases are stored there) then there is if Zeppelin is using static interpreter invocation, thus portability is not a benefit of the static method, and the aliased method can provide a good deal of analyst agility/definition in a multi data set/source environment. My thought is we should people to create new interpreters of known types, and on creation of these interpreters allow the invocation to be stored in the interpreter.json. Also, if a new interpreter is registered, it would follow the same interpreter group methodology. Thus if I setup a new %spark to be %entspark, then the sub interpreters (pyspark, sparksql etc) can be there and have access to the master entspark, and also can be renamed. so that sub interpreter can be renamed, and the access it has to interpreter group is based on the parent child relationship, not just by name... Thoughts? On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wangzhong....@gmail.com> wrote: > Thanks moon - it is good to know the ideas behind the design. It makes a > lot more sense to use system-defined identifiers in order to make the > notebook portable. > > Currently, I can name the interpreter in the WebUI, but actually the name > doesn't help to distinguish between my spark interpreters, which is quite > confusing to me. I am not sure whether this is a better way: > -- > 1. the UI generates the default identifier for the first spark > interpreter, which is %spark > 2. when the user creates another spark interpreter, the UI asks the users > to provide a user-defined identifier > > Zhong > > On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <m...@apache.org> wrote: > >> In the initial stage of development, there were discussion about >> %xxx, where xxx should be user defined interpreter identifier, or should >> be a static interpreter identifier. >> >> And decided to go later one. Because we wanted keep notebook file >> portable. i.e. Let run imported note.json file from other Zeppelin instance >> without (or minimum) modification. >> >> if we use user defined identifier, running imported notebook will not be >> very simple. This is why %xxx is not using user defined interpreter >> identifier at the moment. >> >> If you have any other thoughts, ideas, please feel free to share. >> >> Thanks, >> moon >> >> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wangzhong....@gmail.com> >> wrote: >> >>> Thanks, Moon! I got it worked. The reason why it didn't work is that I >>> tried to use both of the spark interpreters inside one notebook. I think I >>> can create different notebooks for each interpreters, but it would be great >>> if we could use "%xxx", where xxx is the user defined interpreter >>> identifier, to identify different interpreters for different paragraphs. >>> >>> Besides, because currently both of the interpreters are using "spark" as >>> the identifier, they share the same log file. I am not sure whether there >>> are other cases they interfere with each other. >>> >>> Thanks, >>> Zhong >>> >>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <m...@apache.org> wrote: >>> >>>> Hi, >>>> >>>> Once you create another spark interpreter in Interpreter menu of GUI, >>>> then each notebook should able to select and use it (setting icon on >>>> top right corner of each notebook). >>>> >>>> If it does not work, could you find error message on the log file? >>>> >>>> Thanks, >>>> moon >>>> >>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wangzhong....@gmail.com> >>>> wrote: >>>> >>>>> Hi zeppelin pilots, >>>>> >>>>> I am trying to run multiple spark interpreters in the same Zeppelin >>>>> instance. This is very helpful if the data comes from multiple spark >>>>> clusters. >>>>> >>>>> Another useful use case is that, run one instance in cluster mode, and >>>>> another in local mode. This will significantly boost the performance of >>>>> small data analysis. >>>>> >>>>> Is there anyway to run multiple spark interpreters? I tried to create >>>>> another spark interpreter with a different identifier, which is allowed in >>>>> UI. But it doesn't work (shall I file a ticket?) >>>>> >>>>> I am now trying running multiple sparkContext in the same spark >>>>> interpreter. >>>>> >>>>> Zhong >>>>> >>>>> >>>>> >>> >