Re: [ANNOUNCE] Apache Zeppelin 0.7.2 released

2017-06-14 Thread Vinay Shukla
Cangatulations Apache Zeppelin community.

I am sure our users will appreciate your efforts and this release.

Let's make Zeppelin great.


On Tue, Jun 13, 2017 at 10:08 AM Jun Kim  wrote:

> Cool! Thanks for your work Mina :)
>
> 2017년 6월 14일 (수) 오전 2:06, Mina Lee 님이 작성:
>
>> The Apache Zeppelin community is pleased to announce the availability of
>> the 0.7.2 release.
>>
>> Zeppelin is a collaborative data analytics and visualization tool for
>> distributed, general-purpose data processing system such as Apache Spark,
>> Apache Flink, etc.
>>
>> The community put significant effort into improving Apache Zeppelin since
>> the last release. 25 contributors provided 50+ patches
>> for improvements and bug fixes. More than 40+ issues have been resolved.
>>
>> We encourage you to download the latest release from
>> http://zeppelin.apache.org/download.html
>>
>> Release note is available at
>> http://zeppelin.apache.org/releases/zeppelin-release-0.7.2.html
>>
>> We welcome your help and feedback. For more information on the project and
>> how to get involved, visit our website at http://zeppelin.apache.org/
>>
>> Thank you all users and contributors who have helped to improve Apache
>> Zeppelin.
>>
>> Regards,
>> The Apache Zeppelin community
>>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>


Re: Zeppelin 0.6.0 (on EMR) stops responding after several runs (NPE error), and back online after restart zeppelin

2017-06-14 Thread Jeff Zhang
Hi Richard,

Is it possible for you to upgrade to 0.6.2 to 0.7.1 ?

0.6.0 has some critical issue



Richard Xin 于2017年6月15日周四 上午5:02写道:

> and I also saw stack overflow issue:
> Caused by: java.lang.StackOverflowError
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
> at
> scala.tools.nsc.transform.LambdaLift$$anon$1.apply(LambdaLift.scala:30)
> at
> scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:110)
>
>
> On Wednesday, June 14, 2017, 12:38:30 PM PDT, Richard Xin <
> richardxin...@yahoo.com> wrote:
>
>
> it happened several times already, worked again after restart Zeppelin
>
> I see consistently similar error when died
>
> ERROR [2017-06-14 17:59:59,705] ({pool-2-thread-2}
> SparkInterpreter.java[putLatestVarInResourcePool]:1253) -
> java.lang.NullPointerException
> at
> scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
> at
> org.apache.zeppelin.spark.SparkInterpreter.getLastObject(SparkInterpreter.java:1114)
> at
> org.apache.zeppelin.spark.SparkInterpreter.putLatestVarInResourcePool(SparkInterpreter.java:1249)
> at
> org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1232)
> at
> org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1144)
> at
> org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1137)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
>
>
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at
> 

Zeppelin 0.6.0 (on EMR) stops responding after several runs (NPE error), and back online after restart zeppelin

2017-06-14 Thread Richard Xin
it happened several times already, worked again after restart Zeppelin
I see consistently similar error when died
ERROR [2017-06-14 17:59:59,705] ({pool-2-thread-2} 
SparkInterpreter.java[putLatestVarInResourcePool]:1253) -
java.lang.NullPointerException
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at 
org.apache.zeppelin.spark.SparkInterpreter.getLastObject(SparkInterpreter.java:1114)
at 
org.apache.zeppelin.spark.SparkInterpreter.putLatestVarInResourcePool(SparkInterpreter.java:1249)
at 
org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1232)
at 
org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1144)
at 
org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1137)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)


at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 



Re: [DISCUSSION] Extending TableData API

2017-06-14 Thread Jeff Zhang
>>> But not sure about how other interpreters can do the same thing. (e.g
trivial, but let’s think about shell interpreter which keeps it’s tabledata
on memory)

The approach I proposed is general for all the interpreters. What we need
do is to add one method in RemoteInterpreterProcess for other
interpreters to fetch resources.

>>> Some people might wonder why we do not use external storages to persist
(large) table resources instead of keeping them in memory of ZeppelinServer.

It is fine to use memory for now. But we should leave an interface there
for other storages. For now we could just have MemoryStorage, could have
other implementations in future.


Park Hoon <1am...@gmail.com>于2017年6月14日周三 下午10:22写道:

> @Jeff, Thanks for sharing your opinions and important questions.
>
>
> > Q1. What does the resource registration mean? IIUC, currently it means
> it would cache the data in Interpreter Process. Then it might be a memory
> issue when more and more resources are registered. Maybe we could introduce
> resource retention mechanism or cache the data in other formats (just like
> the spark table cache policy, user can specify how to cache the data, like
> memory, disk and etc.
>
> A1. It depends on an implementation of TableData for each interpreter.
> For example,
>
> If JDBC interpreter only keeps the SQL in a paragraph to reproduce the
> table, we don’t need to persist the whole table data in memory or file
> system or an external storage. That’s what the section 3.2 describes.
>
> [image: Inline image 2]
>
>
>
>
> > Q2. The scope of resource sharing. For now, it seems it is globally
> shared. But I think user level sharing might be more common. Then we need
> to create a namespace for each user. That means the same resource name
> could exist in different user namespace.
>
> A2. Regarding the namespace concept, the proposal only describes what the
> table resource name should be? (Section 5.3) not about namespaces.
>
> The namespace can be the name of a note or custom (e.g creating users’
> namespace). We can discuss this.
>
> Personally, +1 for having namespace because it is helpful for searching
> and sharing. This might be included by `ResourceRegistry`
>
>
> [image: Inline image 1]
>
>
> > Q3. The data route might cause performance issue.  From the diagram, If
> spark interpreter needs to access a resource from jdbc interpreter. Then
> first data needs to be send to zeppelin server, and then zeppelin server
> send the data to spark interpreter. This kind of data route introduce a bit
> more overhead to me. And zeppelin server will become a bottleneck and
> require large memory when there're many resources to be shared across
> users/interpreters. So I would suggest the following approach. Zeppelin
> Server just control the metadata and ACL of resources. And Spark
> Interpreter will fetch data from Jdbc Interpreter directly instead of
> through zeppelin server.  Here's the sequences
>1). SparkInterpreter ask for metadata and token for the resource
>2). Zeppelin Server will check whether this SparkInterprter has
> permission to access this resource, if yes, then send the metadata and
> token to SparkInterpreter. The metadata includes the RPC address of the
> JdbcInterpreter and token is for security.
>3). SparkInterpreter ask JdbcInterpreter for the resource via the
> the token and metadata received in step 2
>4). JdbcInterpreter verify the token, and send the data to
> SparkInterpreter.
>
> A3. +1 direct accessing in spark interpreter to JDBC since it’s better for
> large data handling. But not sure about how other interpreters can do the
> same thing. (e.g trivial, but let’s think about shell interpreter which
> keeps it’s tabledata on memory)
>
>
> --
>
> Some people might wonder why we do not use external storages to persist
> (large) table resources instead of keeping them in memory of ZeppelinServer.
>
> The authors originally discussed whether having an external storage or
> not. But having external storage
>
> - requires additional (lots of) dependencies. (Geode? Redis? HDFS? Which
> one should we use? or support all?)
> - even with external storage, we might not persist 400GB, 10TB.
>
> Thus, the proposal was written to
>
> - utilize interpreter’s own storage (e.g spark cluster for spark
> interpreter)
> - keep the minimal things to reproduce the table result (e.g keeping the
> only query) while don’t affect on external storage as well at first.
>
>
> And now we are discussing, hope we can improve the proposal and turn it
> into a reall implementation soon. :)
>
>
>
> Thanks.
>
>
>
>
> On Wed, Jun 14, 2017 at 12:20 PM, Jeff Zhang  wrote:
>
>>
>> Hi Park,
>>
>> Thanks for the sharing, this is a very interested and innovated idea. I
>> have several comments and concerns.
>>
>> 1. What does the resource registration mean ?
>>IIUC, currently it means it would cache the data in Interpreter
>> Process. Then it