Re: Extending SparkInterpreter functionality

Jeff Zhang Thu, 01 Feb 2018 22:15:38 -0800

1) Spark UI which works differently on EMR than standalone, so that logic
will be in an interpreter specific to emr.
   Could you create a ticket for that, and please add details of that ? I
don't know exactly what the difference between EMR and standalone, we can
expose api to allow customization if necessary.



2) We want to add more metrics & logs in the interpreter, say number of
requests coming to the interpreter.
   Could you create a ticket for that as well ? I think it is not difficult
to do that.

3) Ideally we will like to connect to different spark clusters in
spark-submit and not tie to one which happens on Zeppelin startup right now.
It is possible now already, you can create different spark interpreter for
different spark clusters. e.g. you can create spark_16 for spark 1.6 and
spark_22 for spark 2.2, and what you need to do is just setting SPARK_HOME
properly in their interpreter setting.


Ankit Jain <[email protected]>于2018年2月2日周五 下午1:36写道：

> This is exactly what we want Jeff! A hook to plug in our own interpreters.
> (I am on same team as Jhon btw)
>
> Right now there are too many concrete references and injecting stuff is
> not possible.
>
> Eg of customizations -
> 1) Spark UI which works differently on EMR than standalone, so that logic
> will be in an interpreter specific to emr.
> 2) We want to add more metrics & logs in the interpreter, say number of
> requests coming to the interpreter.
> 3) Ideally we will like to connect to different spark clusters in
> spark-submit and not tie to one which happens on Zeppelin startup right now.
>
> Basically we want to add lot more flexibility.
>
> We are building a platform to cater to multiple clients. So, multiple
> Zeppelin instances, multiple spark clusters, multiple Spark UIs and on top
> of that maintaining the security and privacy in a shared multi-tenant env
> will need all the flexibility we can get!
>
> Thanks
> Ankit
>
> On Feb 1, 2018, at 7:51 PM, Jeff Zhang <[email protected]> wrote:
>
>
> Hi Jhon,
>
> Do you mind to share what kind of custom function you want to add to spark
> interpreter ? One idea in my mind is that we could add extension point to
> the existing SparkInterpreter, and user can enhance SparkInterpreter via
> these extension point. That means we just open some interfaces and users
> can implement those interfaces, and just add their jars to spark
> interpreter folder.
>
>
>
> Jhon Anderson Cardenas Diaz <[email protected]>于2018年2月2日周五
> 上午5:30写道：
>
>> Hello!
>>
>> I'm a software developer and as part of a project I require to extend the
>> functionality of SparkInterpreter without modifying it. I need instead
>> create a new interpreter that extends it or wrap its functionality.
>>
>> I also need the spark sub-interpreters to use my new custom interpreter,
>> but the problem comes here, because the spark sub-interpreters has a
>> direct dependency to spark interpreter as they use the class name of spark
>> interpreter to obtain its instance:
>>
>>
>>     private SparkInterpreter getSparkInterpreter() {
>>
>> ...
>>
>>         Interpreter p = 
>> getInterpreterInTheSameSessionByClassName(SparkInterpreter.class.getName());
>>
>> }
>>
>>
>> *Approach without modify apache zeppelin*
>>
>> My current approach to solve is to create a SparkCustomInterpreter that
>> override the getClassName method as follows:
>>
>> public class SparkCustomInterpreter extends SparkInterpreter {
>>     ...
>>
>>     @Override
>>     public String getClassName() {
>>         return SparkInterpreter.class.getName();
>>     }
>> }
>>
>>
>> and put the new class name in the interpreter-setting.json file of spark:
>>
>> [
>>   {
>>     "group": "spark",
>>     "name": "spark",
>>     "className": "org.apache.zeppelin.spark.SparkCustomInterpreter",
>>     ...
>>     "properties": {...}
>>   }, ...
>> ]
>>
>>
>> The problem with this approach is that when I run a paragraph it fails.
>> In general it fails because zeppelin uses both the class name of the
>> instance and the getClassName() method to access the instance, and that
>> causes many problems.
>>
>> *Approaches modifying apache zeppelin*
>>
>> There are two possible solutions related with the way in which the
>> sub-interpreters get the SparkInterpreter instance class, one is getting
>> the class name from a property:
>>
>>
>>     private SparkInterpreter getSparkInterpreter() {
>>
>> ...
>>
>>         Interpreter p = 
>> getInterpreterInTheSameSessionByClassName(*property.getProperty("zeppelin.spark.mainClass",
>>  SparkInterpreter.class.getName())* );
>>
>> }
>>
>> And the other possibility is to modify the method
>> Interpreter.getInterpreterInTheSameSessionByClassName(String) in order to
>> return the instance that whether has the same class name specified in the
>> parameter or which super class has the same class name specified in the
>> parameter:
>>
>>
>> @ZeppelinApi
>> public Interpreter getInterpreterInTheSameSessionByClassName(String 
>> className) {
>>   synchronized (interpreterGroup) {
>>     for (List<Interpreter> interpreters : interpreterGroup.values()) {
>>       ....
>>       for (Interpreter intp : interpreters) {
>>         if (intp.getClassName().equals(className) *|| 
>> intp.getClass().getSuperclass().getName().equals(className)*) {
>>           interpreterFound = intp;
>>         }
>>
>>         ...
>>       }
>>
>>       ...
>>     }
>>   }
>>   return null;
>> }
>>
>>
>> Either of the two solutions would involve the modification of apache
>> zeppelin code; do you think the change could be contributed to the
>> community?, or maybe do you realize some other approach to change the
>> way in which sub-interpreters of spark get the instance of spark interpreter
>> ?
>>
>> Any information about it I'll be attempt.
>>
>> Greetings
>>
>>
>> Jhon
>>
>

Re: Extending SparkInterpreter functionality

Reply via email to